1. Genetics and Genomics
Download icon

Pervasive epigenetic effects of Drosophila euchromatic transposable elements impact their evolution

  1. Yuh Chwen G Lee  Is a corresponding author
  2. Gary H Karpen  Is a corresponding author
  1. Lawrence Berkeley National Laboratory, United States
  2. University of California Berkeley, United States
Research Article
  • Cited 7
  • Views 1,391
  • Annotations
Cite as: eLife 2017;6:e25762 doi: 10.7554/eLife.25762

Abstract

Transposable elements (TEs) are widespread genomic parasites, and their evolution has remained a critical question in evolutionary genomics. Here, we study the relatively unexplored epigenetic impacts of TEs and provide the first genome-wide quantification of such effects in D. melanogaster and D. simulans. Surprisingly, the spread of repressive epigenetic marks (histone H3K9me2) to nearby DNA occurs at >50% of euchromatic TEs, and can extend up to 20 kb. This results in differential epigenetic states of genic alleles and, in turn, selection against TEs. Interestingly, the lower TE content in D. simulans compared to D. melanogaster correlates with stronger epigenetic effects of TEs and higher levels of host genetic factors known to promote epigenetic silencing. Our study demonstrates that the epigenetic effects of euchromatic TEs, and host genetic factors modulating such effects, play a critical role in the evolution of TEs both within and between species.

https://doi.org/10.7554/eLife.25762.001

eLife digest

The DNA inside an organism encodes all the instructions needed for the organism to develop and work properly. Organisms carefully organize and maintain their DNA (collectively known as the genome) so that the genetic information remains intact and the cell can understand the instructions. However, there are some pieces of DNA that are capable of moving around the genome. For example, pieces known as transposable elements can make new copies of themselves and jump into new locations in the genome. Most transposons do not appear to have any important roles, and in fact they are usually harmful to organisms. Despite this, transposons are present in the genomes of almost all species. The number of transposons in a genome varies greatly between individuals and species, but it is not clear why this is the case.

Organisms have evolved ways to limit the damage caused by transposons. For example, many cells package regions of DNA containing transposons into a tightly packed structure known as heterochromatin. However, this type of DNA packaging sometimes spreads to neighboring sections of DNA. This is a problem because cells are not usually able to read the information contained within heterochromatin. This means that transposons can prevent some instructions from being produced when they should be. Lee and Karpen used fruit flies to investigate to what extent transposons harm organisms by changing the way DNA is packaged, and whether this influences how transposons evolve.

The experiments show that that more than half of the transposons in fruit flies cause neighboring sections of DNA to be packaged into heterochromatin. This can negatively impact up to 20% of genes in the genome. As a result, transposons that have harmful effects on DNA packaging are more likely to be lost from the fly population during evolution than transposons that do not have harmful effects. Fruit fly species containing transposons that tend to package more neighboring sections of DNA into heterochromatin generally have fewer transposons than genomes containing less harmful transposons.

The findings of Lee and Karpen provide new insight as to why the numbers of transposons vary among organisms. The next challenge is to find out whether transposons that alter how DNA is packaged are also common in primates and other animals.

https://doi.org/10.7554/eLife.25762.002

Introduction

Transposable elements (TEs) are genetic elements that can copy and transpose themselves into new genomic locations. Even though there are incidental reports of potentially adaptive TEs (Daborn et al., 2002; Schlenke and Begun, 2004; Aminetzach et al., 2005; González et al., 2008; Schmidt et al., 2010; Mateo et al., 2014; Hof et al., 2016), they generally lower host fitness (Mackay, 1989; Pasyukova et al., 2004) and are widely recognized as ‘genomic parasites’. Despite the deleterious fitness consequences of TEs, they comprise appreciable and highly variable proportions of euchromatic genomes in all eukaryotes surveyed (Biémont, 2010; Elliott and Gregory, 2015; Chalopin et al., 2015). However, fundamental questions remain about the mechanisms that restrict the selfish increases in TE copy number and contribute to the wide variation of TEs within and between species.

Theoretical analyses predict that, in an outbreeding and meiotically recombining host population, copy number of TEs can be contained (i.e. reach an equilibrium) if the increase in copy number through transposition is counterbalanced by the removal of TEs (Charlesworth and Charlesworth, 1983; Langley et al., 1983). One possible mechanism for the containment of TEs is to regulate the transposition rate to equal the removal rate. Small RNAs in Drosophila, mammals, and plants are enriched for TE sequences and regulate the transposition of TEs in host germlines (Girard et al., 2006; Gunawardane et al., 2007; Brennecke et al., 2007; Aravin et al., 2007; Slotkin et al., 2009). These small RNAs guide the Ago and/or Piwi subfamilies of Argonaute proteins (reviewed in [Hutvagner and Simard, 2008]) to TE transcripts with complementary sequences, resulting in post-transcriptional silencing (reviewed in [Klattenhoff and Theurkauf, 2008; Girard and Hannon, 2008; Senti and Brennecke, 2010]). In addition, TEs can be transcriptionally silenced through small-RNA guided enrichment of repressive epigenetic marks, which include DNA modifications (such as methylation) and post-translational histone modifications (such as di- and tri-methylation of H3 lysine 9 (H3K9me2/3), [Klenov et al., 2007; Aravin et al., 2008; Sienski et al., 2012; Le Thomas et al., 2013]). Both post-transcriptional and transcriptional silencing mechanisms reduce the RNA and protein output from TEs, and accordingly lower TE transposition rate. However, despite the presence of small-RNA regulation, measured transposition rates of TEs are significantly higher than excision rates (reviewed in [Charlesworth and Langley, 1989]). Furthermore, euchromatic TE insertions in multiple outbreeding species have low population frequencies (Charlesworth and Langley, 1989; Dolgin et al., 2008; Lockton et al., 2008; González et al., 2008; Lockton and Gaut, 2010; Cridland et al., 2013; Kofler et al., 2015), and a reduction in transposition rate alone is unlikely to explain such observations.

Alternatively, selection against the deleterious effects of TEs has been theoretically proposed and empirically supported as a major force that removes TE insertions from host populations and shapes the population dynamics of TEs (reviewed in [Charlesworth and Langley, 1989; Lee and Langley 2010; Barrón et al., 2014]). It is well-established that TEs can be deleterious through their genetic effects, such as inserting into and disrupting genes and other functional elements (Cooley et al., 1988; Finnegan, 1992), acting as ectopic regulatory elements (Feschotte, 2008), and mediating ectopic recombination that results in detrimental chromosomal rearrangements (Langley et al., 1988; Montgomery et al., 1991; Petrov et al., 2003; Mieczkowski et al., 2006). On the other hand, TE insertions can also influence the epigenetic states of adjacent functional sequences, interfering with gene regulation (‘epigenetic effects’; reviewed in [Slotkin and Martienssen, 2007]). A genome-wide study in A. thaliana first established the associations between DNA methylation of TEs and lower transcript levels of adjacent genes (Hollister and Gaut, 2009). Later studies identified TEs as a major cause for DNA methylation-enriched regions in the A. thaliana genome (Ahmed et al., 2011; Schmitz et al., 2013; Dubin et al., 2015; Quadrana et al., 2016; Kawakatsu et al., 2016; Stuart et al., 2016), and several demonstrated that this association results from spreading of DNA methylation from epigenetically silenced TEs (Ahmed et al., 2011; Quadrana et al., 2016; Stuart et al., 2016). Associations between TEs and enrichment of repressive epigenetic marks were also documented in mouse cell lines (Rebollo et al., 2012) and maize (Eichten et al., 2012; West et al., 2014). However, only (Hollister and Gaut, 2009) explored the influences of these TE-induced enrichment of repressive epigenetic marks on the evolutionary dynamics of TEs.

In Drosophila, TE-induced enrichment of repressive epigenetic marks at functional elements in euchromatin was first solidly supported by comparing the epigenetic states of reporter genes in constructs with and without adjacent TEs (Sentmanat and Elgin, 2012). The same study also found that epigenetic effects of TEs depend on small-RNA targeting, and thus on host-directed transcriptional silencing of TEs. This spreading of repressive epigenetic marks from epigenetically silenced euchromatic TEs is reminiscent of the well-studied position-effect variegation (PEV), in which repressive epigenetic marks from pericentromeric or subtelomeric heterochromatin spread to juxtaposed euchromatic genes and cause stochastic gene silencing ([Gowen and Gay, 1934], reviewed in [Girton and Johansen, 2008; Elgin and Reuter, 2013]). The extent of PEV is influenced by several genetic factors, including the amount of heterochromatic DNA in a genome (reviewed in [Girton and Johansen, 2008]) and heterochromatic enzymatic and structural proteins whose hypomorphic or null mutations enhance or suppress PEV (known as E(var)s and Su(var)s respectively, [Elgin and Reuter, 2013; Swenson et al., 2016]). Likewise, the epigenetic effects of TEs on an adjacent reporter genes were observed to be contingent on the expression of two Su(var) genes (Sentmanat and Elgin, 2012).

Previously, we used D. melanogaster modEncode epigenomic data (Nègre et al., 2011) and demonstrated that histone H3K9me3, a key repressive epigenetic mark, is enriched around euchromatic TEs (Lee 2015). Importantly, TEs adjacent to genes that are highly enriched with H3K9me3 are more strongly selected against, supporting an important role for TE’s epigenetic effects in its own population dynamics (Hollister and Gaut, 2009; Lee and Langley 2010). Yet, several critical questions remain. For example, our previous study was based on the reference D. melanogaster strain (Adams et al., 2000), which has been maintained as a laboratory stock for many years, and is unlikely to be representative of natural populations. More importantly, single-strain analysis precluded distinguishing whether the enrichment of H3K9me3 at genes was due to TE-induced enrichment of repressive epigenetic marks, or the preferential insertions of TEs into genomic regions already enriched in repressive marks (Lee 2015).

To test the hypothesis that euchromatic TE insertions nucleate repressive epigenetic marks, here we exploit natural variation in the presence/absence of individual TE insertions in D. melanogaster populations. In this species, euchromatic insertions from most TE families segregate at low population frequencies (Charlesworth and Langley, 1989; Kofler et al., 2012, 2015; Cridland et al., 2013). Accordingly, randomly selected, unrelated individuals usually share few TE insertions. This will provide a direct comparison of epigenetic states at homologous sequences with and without the presence of TEs, and allow distinguishing the causal relationship between the presence of TEs and the enrichment of repressive epigenetic marks. Importantly, TEs only comprise 5.4% of the D. melanogaster euchromatic genome (Hoskins et al., 2015), and the epigenetic effects of individual TE insertions can thus be determined.

In this study, analyses of the epigenomes of two recently established, wild-derived, inbred D. melanogaster strains showed that euchromatic TEs are responsible for the enrichment of repressive epigenetic marks in flanking regions. Further, analysis of individual insertions revealed that more than half of euchromatic TEs are associated with epigenetic effects on flanking sequences, demonstrating their pervasive impact on the Drosophila genome. Importantly, we found evidence supporting stronger selection against TE insertions with more extensive epigenetic effects. Comparisons between the closely related D. melanogaster and D. simulans revealed that the epigenetic effects of TEs also vary between species, and correlate with variation in host genetic factors that regulate epigenetic silencing. Our results support that the epigenetic effects of euchromatic TEs, and host genetic factors that modulate these effects, play an important role in the population dynamics of TEs within and between species.

Results

Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences

In Drosophila, repressive histone modifications H3K9me2/3 (Kouzarides, 2007; Grewal and Elgin, 2007) and their cognate ‘reader’ protein Heterochromatin Protein 1a (HP1a) (Eissenberg and Elgin, 2014) play a dominant role in the initiation and maintenance of repressive chromatin states in heterochromatin. Our previous study showed that euchromatic sequences flanking TEs have strong enrichment for H3K9me3 in the reference D. melanogaster strain (Lee 2015). Using modEncode ChIP-seq data generated from the Oregon-R strain, we observed that sequences flanking euchromatic TEs are also enriched for another key heterochromatic histone modification (H3K9me2) and HP1a, while depleted for ‘active’ histone modifications H3K4me2 and H3K4me3, which are enriched at transcribing promoters (Kouzarides, 2007; Kharchenko et al., 2011) (Figure 1). Interestingly, enrichment for repressive epigenetic marks around TEs is strongest at the embryonic stage and weaker at later developmental stages, consistent with our previous study of only H3K9me3 (Lee 2015).

Epigenetic states of euchromatic sequences around TEs.

Euchromatic sequences around TEs are enriched for (A) repressive epigenetic marks (H3K9me2, H3K9me3, and HP1a), (B) and depleted for active epigenetic marks (H3K4me2 and H3K4me2) in Oregon-R. Different colors represent different developmental stages. Plots were generated using LOESS smoothing (span = 10%).

https://doi.org/10.7554/eLife.25762.003

To investigate if the enrichment of repressive epigenetic marks is TE-induced or results from the preferential insertion of TEs into regions already enriched for repressive epigenetic marks, we performed Chromatin Immuno-Precipitation and sequencing (ChIP-seq) on H3K9me2 using two inbred, wildtype D. melanogaster strains collected in North Carolina, USA (RAL315 and RAL360 from Drosophila Genetic Reference Panel or DGRP [Mackay et al., 2012]). These strains have been fully sequenced and annotated for the locations of euchromatic TE insertions (Rahman et al., 2015), allowing direct comparison of the epigenetic status of allelic regions with and without TEs. Our ChIP-Seq analyses of H3K9me2 distributions (see Materials and methods) in 4–8 hr RAL315 and RAL360 embryos, which contain fully-formed heterochromatin (Yuan and O'Farrell, 2016), only included TE insertions annotated with high confidence and unique to either strain (see Materials and methods). Importantly, we used highly conservative heterochromatin-euchromatin boundaries (0.5 Mb distal from epigenetically defined boundaries [Riddle et al., 2011]). This ensures that only euchromatic sequences and TEs were included in the analysis, and prevents confounding effects from pericentromeric or subtelomeric heterochromatin. Because the ChIP-Seq data were generated using whole animals that contain multiple cell types, combined with the stochastic nature of heterochromatic silencing, H3K9me2 enrichment reflects the average epigenetic states of all cells in the samples. Accordingly, we analyzed the enrichment of H3K9me2 quantitatively, instead of as binary states (see Materials and methods).

We compared the epigenetic states of euchromatic sequences around all TE insertions present in one strain with those of homologous alleles lacking the TE insertions in the other strain. The presence of TEs correlated with substantially higher H3K9me2 enrichment (Figure 2), which strongly supports the conclusion that these repressive mark enrichments are due to TE insertions, and not pre-existing epigenetic states. To quantify the epigenetic effects of individual TEs, we compared H3K9me2 fold enrichment in strains with and without a TE using non-overlapping 1 kb windows around each TE insertion (Figure 2—figure supplement 1). A TE was counted as having epigenetic effects if the H3K9me2 enrichment level was significantly higher in the strain with the TE than the other strain in the 0–1 kb windows flanking the TE insertion. We also estimated the ‘extent of H3K9me2 spread’ from the TE insertion (the farthest window in which H3K9me2 enrichment was consecutively and significantly higher in the strain with the TE) and the ‘% increase in H3K9me2 enrichment’ (the difference in H3K9me2 enrichment between the two strains in 0–1 kb windows; see Materials and methods and Figure 2—figure supplement 1).

Figure 2 with 3 supplements see all
Euchromatic sequences around TE insertions are enriched for H3K9me2.

Levels of H3K9me2 enrichment were compared between homologous sequences of two D. melanogaster strains. Left: sequences around TEs in strain RAL315 that are absent in RAL360. Right: sequences around TEs in strain RAL360 that are absent in RAL315. H3K9me2 fold enrichment was averaged over all euchromatic sequences flanking the analyzed TEs. Plots were generated using LOESS smoothing (span = 10%). Upper figures show ±50 kb around TE insertions, while lower figures show expanded views of ±20 kb.

https://doi.org/10.7554/eLife.25762.004

Surprisingly, more than half of the euchromatic TEs (54.2% of 419 TEs analyzed) were associated with enrichment for H3K9me2 in at least 1 kb of adjacent sequences. The TE-induced spreading of H3K9me2 in flanking sequence extended for a mean of 4.50 kb (standard deviation 4.59 kb), and an average of 79.8% increase in H3K9me2 enrichment at the TE insertion site (standard deviation 78.8%, Figure 2—source data 1). These observations revealed that the epigenetic effects of euchromatic TEs in D. melanogaster are not only pervasive and extensive, but also highly variable between TE insertions.

Previous investigations using randomly inserted transgenic constructs in D. melanogaster found that the epigenetic effects of TEs depend on proximity to pericentromeric or subtelomeric heterochromatin (Sentmanat and Elgin, 2012), and on local repeat density (Huisinga et al., 2016). Our analysis focused on regions far from these heterochromatic regions, and showed that TEs associated with H3K9me2 enrichment at flanking sequences are not concentrated around pericentromeric or subtelomeric heterochromatin (Figure 2—figure supplement 2). Also, local repeat density does not differ between TEs that are or are not associated with H3K9me2 spreading (Mann-Whitney U test, p=0.55). Similarly, we observed no correlations between the extent or magnitude of TE’s epigenetic effects and local repeat density (Spearman rank correlation test, p=0.81 (repeat density vs extent of H3K9me2 spread), 0.65 (repeat density vs % increase in H3K9me2)). These results demonstrate that epigenetic influences of TEs are not restricted to specific genomic locations or contexts, and can be observed across diverse euchromatic regions.

TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects

While our results demonstrate that euchromatic TEs have widespread epigenetic effects in D. melanogaster, we also found that the epigenetic effects of individual TE insertions vary significantly. In particular, there is substantial variation in the epigenetic effects of insertions from different TE families (Figure 3). Many biological properties differ between TE families, including transposition mechanism (Wicker et al., 2007), genome abundance (Kaminker et al., 2002; Quesneville et al., 2005), and targeting by small RNAs (Gunawardane et al., 2007; Brennecke et al., 2007, 2008; Ghildiyal et al., 2008; Czech et al., 2008). We investigated which properties are associated with stronger epigenetic effects of insertions from a TE family.

Variation in the epigenetic effects of different TE families.

There is substantial variation in the (A) proportion of TEs with epigenetic effects, (B) mean extent of H3K9me2 spread, and (C) mean % increase in H3K9me2 enrichment of the TE families analyzed. Different colors denote different types of TEs. The number of observations for each TE family is in parenthesis in (A).

https://doi.org/10.7554/eLife.25762.009

Based on transposition mechanisms, there are three major types of TE families: Long Terminal Repeats (LTR) retrotransposons, non-LTR retroposons, and Terminal Inverted Repeats (TIR) transposons. An immediately obvious pattern is that LTR-type TE families seem to have the strongest epigenetic effects. The LTR copia family has the largest proportion of insertions with epigenetic effects, and LTR roo insertions display both the most extensive average spread of H3K9me2 and the largest average increase in H3K9me2 enrichment in flanking sequences (Figure 3). Similarly, eight of 11 TE families in which over half of analyzed insertions showed epigenetic effects are LTR-type, while the remaining three are TIR-type. The two TE families with >5 kb average spread of H3K9me2 and the four families that yield >50% mean increase in H3K9me2 enrichment are all LTR-type families. To formally test if LTR-type TE families have stronger epigenetic effects than other types of TE families, we estimated the proportion of TEs with epigenetic effects, the average extent of H3K9me2 spread, and average % increase in H3K9me2 enrichment of TE insertions from each TE family and compared these metrics between LTR-type and other types of TE families. Indeed, LTR-type TE families show a larger increase in H3K9me2 enrichment compared to other types of TEs (Mann-Whitney U test p=0.00047, median: 0.547 (LTR) vs 0.352 (others), Figure 4). The other two indexes are not significantly different, likely due to the high heterogeneity between LTR-type TE families (Figure 4).

Figure 4 with 1 supplement see all
Quantitative analysis of the epigenetic effects of different types of TE families.

While there are no significant differences in (A) the proportion of TEs with epigenetic effects and (B) the mean extent of H3K9me2 spread, (C) TE insertions of LTR-type families lead to significantly higher mean % increase of H3K9me2 enrichment in flanking sequences. Note that each data point represents one TE family. (*** Kruskal-Wallis test p<0.005).

https://doi.org/10.7554/eLife.25762.010

In Drosophila, TEs are targeted by two types of small RNAs: piRNAs in the germline (Gunawardane et al., 2007; Brennecke et al., 2007) and endo-siRNAs in the soma (Ghildiyal et al., 2008; Czech et al., 2008). The epigenetic silencing of TEs in the germline and early embryo, which is maintained through development (Gu and Elgin, 2013), depends on piRNAs (Klenov et al., 2007; Sentmanat and Elgin, 2012; Le Thomas et al., 2013), while the role of endo-siRNAs in epigenetic silencing of TEs is currently less clear. Consistently, we observed that TE families targeted by more piRNAs show more extensive H3K9me2 spreading and enrichment in flanking sequences (Table 1). It is worth noting that there is no difference in the amount of piRNAs targeting LTR-type TEs compared to other major types of TEs (Mann-Whitney U test, p=0.19 (wK) and 0.39 (w1118)), suggesting that the observed correlation between the amount of piRNAs and TE’s epigenetic effects was unlikely solely driven by stronger epigenetic effects of LTR-type TE families. On the other hand, we did not find significant associations between the epigenetic effects of TEs and targeting by endo-siRNAs (Table 1).

Table 1

Spearman rank correlation tests between properties of TE families and the epigenetic effects of TEs. piRNA amounts were estimated from two studies (two genotypes: w1118 and wK) and siRNA counts were estimated from two studies (Ghildiyal et al., 2008; Czech et al., 2008).

https://doi.org/10.7554/eLife.25762.012
prop. TE with epigenetic effectsmean extent of H3K9me2 spreadmean % of increase in H3K9me2 enrichment
p-valueρp-valueρp-valueρ
piRNA amount (w1118)5.18E-010.1211.67E-020.4651.13E-020.493
piRNA amount (wK)9.99E-010.0003.41E-030.5537.09E-030.521
siRNA counts (Czech et al., 2008)2.90E-010.1934.99E-010.1421.24E-010.316
siRNA counts (Ghildiyal et al., 2008)6.08E-010.1087.46E-01−0.0751.46E-010.329
family copy no.3.61E-030.4736.24E-010.0956.59E-010.085

It has been observed that insertions of abundant TE families are under stronger purifying selection than those of less abundant TE families, and several mechanisms were proposed to account for this copy-number dependency (reviewed in [Barrón et al., 2014]). Because the generation of piRNAs involves TE transcripts (Gunawardane et al., 2007; Brennecke et al., 2007), it was predicted that for a given TE family the epigenetic effects of TEs, and the associated strength of selection that influences the population dynamics of TEs, should also depend on TE copy number (Lee and Langley 2010; Lee 2015). Supporting this prediction, TE families with higher copy numbers in a large sample of African flies (Kofler et al., 2015) have larger proportions of insertions with epigenetic effects (Table 1). This strong correlation is not driven by TE families with exceptional abundance (Figure 4—figure supplement 1), because the removal of those TE families does not qualitatively change the results (Spearman rank ρ = 0. 46, p=0.0058). In summary, TE families of LTR-type, targeted by larger amounts of piRNAs, or of higher abundance display stronger epigenetic effects on adjacent sequences than other TE families.

TEs with epigenetic effects are more strongly selected against

Given the high density of genes and other functional elements in Drosophila (modENCODE Consortium et al., 2010), H3K9me2 spreading from TEs to adjacent sequences is expected to have functional consequences. Accordingly, TE insertions with epigenetic effects should more likely be selected against and have lower population frequencies than TEs without H3K9me2 spreading.

Population genomic analysis indicated that Zambia is the likely ancestral origin of D. melanogaster, and Zambian populations have limited admixture from non-African genomes (Pool et al., 2012; Lack et al., 2015). Demographic history should thus have less effect on the analysis of TE population frequencies in the Zambian population compared to non-ancestral populations. Accordingly, we used genome sequences of a Zambian D. melanogaster population (Lack et al., 2015) to determine the population frequencies of individual TE insertions in the two DGRP strains analyzed (RAL315 and RAL360), which were first collected in North America. Consistent with previous genome-wide observations that most TE insertions have low population frequencies in D. melanogaster (González et al., 2008; Kofler et al., 2012, 2015; Cridland et al., 2013), only 31.5% of TE insertions present in either of the two DGRP strains analyzed were found in the Zambian population, and these TEs displayed very low population frequencies (0.54% (first quartile), 0.56% (median), 1.61% (third quartile), Figure 5—figure supplement 1). We categorized TE insertions in the two DGRP strains according to their presence in the Zambian population (‘high frequency’ – present, ‘low frequency’ – not present). Low frequency TEs were more likely to exhibit spreading of H3K9me2 (Fisher’s Exact Test, p=0.039, odds ratio = 1.58, Figure 5A), led to more extensive spreading (Mann-Whitney U test, p=0.011, Figure 5B and Figure 5—figure supplement 2A), and resulted in a larger increase in H3K9me2 enrichment (Mann-Whitney U test, p=0.014, Figure 5C and Figure 5—figure supplement 2B). Consistently, by analyzing the population frequencies of individual TE insertions, we observed significant negative correlations between the strength of a TE’s epigenetic effects and its population frequency (Spearman rank ρ = −0.15 (extent of H3K9me2 spread) and −0.14 increase in H3K9me2), p<0.005 for both, Figure 5—figure supplement 3).

Figure 5 with 3 supplements see all
TEs with different population frequencies show different strength of epigenetic effects.

TEs with low population frequencies are (A) more likely to show spread of H3K9me2, (B) result in more extensive spread of H3K9me2, and (C) lead to a larger increase in H3K9me2 enrichment. (*Mann-Whitney U test, p<0.05).

https://doi.org/10.7554/eLife.25762.013

A potential confounding factor for our observation is that the population frequencies of TE insertions vary between TE families (i.e. insertions from specific TE families tend to have high/low population frequencies, [Petrov et al., 2011; Kofler et al., 2012, 2015]). Thus, ‘low’ and ‘high’ frequency categories of TE insertions could be comprised of insertions from different TE families, whose variation in population frequencies could be due to factors other than the differential strength of selection removing TE insertions (Blumenstiel, 2011; Blumenstiel et al., 2014). To address this issue, we performed multiple regression analyses that jointly consider the impact of TE’s epigenetic effects and family identity on the population frequencies of TEs (see Materials and methods). Because most TEs in the two DGRP strains analyzed were not detected in the Zambian population (Figure 5—figure supplement 1), we treated the frequency of TE insertions (the number of individuals in which a TE insertion is present in the Zambian population) also as dichotomous variable (‘high frequency’ TE or not, see Materials and methods). Even accounting for the effect of TE family identity, the regression coefficients for TE’s epigenetic effects on population frequencies are still negative for all the regression models analyzed, and are statistically significant for a majority of the models (Table 2), suggesting that TE family identity is unlikely a major contributor for the negative associations between TE’s epigenetic effects and population frequencies.

Table 2

Regression analysis for the associations between TE’s epigenetic effects and population frequencies while accounting for the influence of TE family identity. Population frequencies of individual TE insertion (response variable) were modeled as either dichotomous variable (‘high frequency’ TE or not) or count (TE count). Because the distribution of TE count is overdispersed, TE count was modeled as either ‘quasipoission’ or ‘negative binomial’ in regression analyses. The influence of TE family identity was treated as either fixed or random effect. Also see Table 2—source data 1 for regression coefficients for all TE families.

https://doi.org/10.7554/eLife.25762.017
Extent of spreadMagnitude of spread
Response variableFamily identityp-valueRegression coefficientp-valueRegression coefficient
‘high frequency’ TE or notfixed effect4.72E-01−0.0293.37E-01−0.246
random effect1.58E-01−0.0494.83E-02−0.409
TE count (quasipoisson)fixed effect4.00E-03−0.1884.73E-03−1.121
random effect3.20E-03−0.1361.71E-04−1.400
TE count (negative binomial)fixed effect5.25E-04−0.1512.31E-04−1.041
random effect9.19E-05−0.1385.49E-05−0.986

An alternative explanation for the observed negative associations between TE’s epigenetic effects and population frequencies is that TEs without epigenetic effects tend to occur in regions of low meiotic recombination. TE insertions in regions with low meiotic recombination are repeatedly observed to have higher population frequencies than TEs in other genomic regions (Charlesworth and Lapid, 1989; Charlesworth et al., 1992; Bartolomé and Maside, 2004; Kofler et al., 2012; Cridland et al., 2013). A lower probability of recombination between TE insertions at different genomic locations (ectopic exchange [Langley et al., 1988; Montgomery et al., 1991]) and/or reduced efficacy of selection against TEs due to selective interference (Hill and Robertson, 1966; Felsenstein, 1974) have been proposed to account for these observations (reviewed in [Charlesworth and Langley, 1989; Lee and Langley 2010; Barrón et al., 2014]). However, we observed that recombination rates do not differ between TEs with or without spreading of H3K9me2 (Mann-Whitney U test, p=0.83). Similarly, neither the extent of H3K9me2 spread nor the increase in H3K9me2 enrichment in flanking sequences was correlated with the local recombination rate for individual TE insertions (Spearman rank correlation test, p=0.62 (recombination rate vs extent of H3K9me2 spread) and 0.55 (recombination rate vs % increase in H3K9me2 enrichment)). It is unlikely that variation in recombination rates can account for the observations that TEs with stronger epigenetic effects have lower population frequencies. Overall, these results strongly support the proposed selection against the epigenetic effects of TEs.

Epigenetic effects of TEs result in differential epigenetic states of adjacent coding genes

We hypothesized that selection against TEs with epigenetic effects result from the associated functional consequences, in particular influences on the epigenetic states of adjacent functional elements. To investigate the predicted epigenetic influence of TEs on adjacent genes, we categorized euchromatic protein coding genes according to their shortest distance to a TE (0–1 kb, 1–2 kb, 2–5 kb, 5–10 kb, and no TE within 10 kb; see Materials and methods). Within each of the two strains analyzed, genes in proximity to TEs are more enriched for H3K9me2 (Figure 6—figure supplement 1), consistent with previous observations (Lee 2015).

To investigate the influence of TEs on the epigenetic states of homologous alleles, we calculated a z-score that compares the H3K9me2 enrichment of genic alleles with and without TEs located within 10 kb (see Materials and methods). The absolute value of the z-score reports the magnitude of differences in H3K9me2 enrichment between homologous alleles in the two strains, and the sign indicates if the allele with adjacent TEs has higher H3K9me2 enrichment (positive: yes, negative: no). As expected, genes with adjacent TEs in either strain have significantly higher, positive z-scores compared to genes distant from TEs in both strains (Figure 6A). We further investigated if the differential epigenetic states between homologous alleles depend on TE-induced epigenetic effects, or only on the presence of TEs. For all categories of genes within 10 kb from TEs, z-scores are significantly higher for genes whose neighboring TEs exhibit H3K9me2 spreading (Figure 6B). Consistently, there are significant positive correlations between the z-scores of genes and the extent of epigenetic effects from the nearest TEs (vs. the extent of H3K9me2 spread: Spearman rank ρ = 0.31, p<10−15; vs. % increase in H3K9me2 enrichment: Spearman rank ρ = 0.30, p<10−15). It is worth noting that genes whose nearest TEs did not exhibit epigenetic effects have similar z-scores to genes without TEs within 10 kb (dashed line, Figure 6B). These observations demonstrate that the spread of repressive epigenetic marks from euchromatic TEs leads to substantial epigenetic differences at homologous alleles of adjacent coding genes.

Figure 6 with 1 supplement see all
The epigenetic effects of TEs on adjacent protein coding genes.

(A) Alleles with adjacent TEs have higher H3K9me2 enrichment compared to homologous alleles in the strain that lacks adjacent TEs, as indicated by positive z-scores (see text), and the strength of the effect decreases with distance from TEs. (B) Genes adjacent to TEs with epigenetic effects show stronger differential enrichment for H3K9me2 than genes adjacent to TEs without epigenetic effects. (C) Genes adjacent to low frequency TEs with epigenetic effects, which likely experienced stronger selection against them, show stronger differential enrichment of H3K9me2 than genes adjacent to high frequency TEs with epigenetic effects (Mann-Whitney U test, *p<0.05, **p<0.01, ***p<0.001).

https://doi.org/10.7554/eLife.25762.019

TE insertions with stronger enrichment of repressive marks in adjacent functional alleles should lead to more deleterious functional consequences. We predict these TEs should be under stronger purifying selection and have lower population frequencies than other TEs. To test this hypothesis, we further restricted the analysis to genes whose nearest TEs show spreading of H3K9me2. Among these genes, those whose nearest TEs were absent in the Zambian population (‘low frequency’ TEs, see above) have significantly higher z-scores than genes adjacent to ‘high frequency’ TEs (Mann-Whitney U test, p=0.0019, median: 0.95 (genes near low frequency TEs) vs 0.32 (genes near high frequency TEs)). The observed differences are most prominent for genes within 1 kb of TEs (Figure 6C). Consistently, there is a significant negative correlation between the z-score of a gene and the population frequency of its nearest TE (Spearman rank ρ = −0.18, p<10−3).

A potential functional consequence of H3K9me2 enrichment is reduced transcript levels of influenced alleles. To address this possibility, we performed RNA-seq of developmental stage-matched embryos. Within either strain, there are indeed significant negative correlations between H3K9me2 enrichment and transcript levels for genes within 10 kb of TEs (Spearman rank ρ = −0.35 (RAL315) and −0.33 (RAL360), p<10−16 for both). To compare the differential epigenetic states and transcript levels between homologous alleles, we calculated fold changes in expression and z-scores for H3K9me2 enrichment level using RAL 360 allele as reference (Figure 7A). Note this is different from the z-score used above, which uses the allele without TE as reference. We found an excess number of genes that support the influence of TE’s epigenetic effects on gene expression (higher H3K9me2 enrichment and lower expression of alleles with adjacent TEs, shaded green area in Figure 7A) for genes with TEs within 10 kb when compared to other genes in the genome (Figure 7B, upper 2 × 2 Tables). Restricting the analysis to TEs with epigenetic effects produced an even larger proportion of genes whose TE-neighboring alleles have higher H3K9me2 enrichment and lower expression (Figure 7B, bottom 2 × 2 Tables). Intriguingly, the excess number of genes supporting TE-induced epigenetic effects on expression was mainly observed for one of the two strains analyzed (RAL360). Furthermore, while we found a weak, but significant, negative correlation between z-scores for H3K9me2 enrichment and fold changes in expression for genes without TEs (Spearman rank ρ = −0.035, p=0.0084), there are no such correlations observed for genes with TEs within 10 kb (Spearman rank test p=0. 57 (RAL315) and 0.16 (RAL360)). In fact, there are multiple genes whose alleles associated with TEs have higher enrichment of H3K9me2, but also higher expression (e.g. arrows in Figure 7A). These observations suggest that the influence of TE-induced epigenetic states on gene expression may be more complex (see Discussion).

Differential H3K9me2 enrichment and RNA transcript levels of protein coding genes with and without adjacent TE insertions.

(A) The z-score for H3K9me2 enrichment (X-axis) and log2 expression fold change (Y-axis) were plotted for euchromatic protein coding genes without TEs within 10 kb (‘neither’) and for genes with adjacent TEs in either strain. It is worth noting that both H3K9me2 z-score and log2 expression fold change used RAL360 as reference. Shaded green areas are genes displaying the expected negative influence of TE’s epigenetic effects on gene expression (i.e. alleles adjacent to TEs have higher H3K9me2 enrichment and lower RNA transcript levels), while shaded orange areas are all other cases of epigenetic states and transcript levels. For each sub-plot, the numbers of genes (blue, pink, or gray dots) in each quarter are shown in black, and the numbers of genes whose nearest TEs with epigenetic effects (blue dots) are shown in blue. (B) Left: 2 × 2 contingency table for comparing the number of genes supporting the influence of TE’s epigenetic effects on gene expression (shaded green) and the number of other genes (shaded orange), against those for genes without TEs within 10 kb (‘neither’). Middle and right: 2 × 2 contingency tables for testing if there is an excess number of genes with TEs in RAL315 (middle) and in RAL360 (right) supporting the influence of TE-induced epigenetic effects on gene expression.

https://doi.org/10.7554/eLife.25762.021

Stronger epigenetic effects of TEs in D. simulans compared to D. melanogaster

D. simulans diverged from D. melanogaster only four million years ago (Obbard et al., 2012), yet is widely observed to harbor fewer TE insertions compared to D. melanogaster (Dowsett and Young, 1982; Vieira et al., 1999; Vieira and Biémont, 2004; Kofler et al., 2015). We hypothesized that variation in the epigenetic effects of TEs, and thus strength of selection against them, contributes to this between-species difference in TE content. To test this hypothesis, we performed H3K9me2 ChIP-seq on 4–8 hr embryos from the D. simulans reference strain. Similar to D. melanogaster, we observed strong H3K9me2 enrichment in sequences flanking TEs (Figure 8A), and genes adjacent to TEs have higher H3K9me2 enrichment than genes distant from TEs (Figure 8—figure supplement 1). Furthermore, genes adjacent to TEs with epigenetic effects (see below) have higher H3K9me2 enrichment than genes adjacent to TEs without epigenetic effects (Figure 8—figure supplement 1). For genes within 10 kb of TEs, there is also a strong negative correlation between H3K9me2 enrichment and transcript levels (Spearman rank ρ = −0.45, p<10−16), supporting a functional consequence of H3K9me2 enrichment in D. simulans.

Figure 8 with 2 supplements see all
D. simulans TEs show stronger epigenetic effects than D. melanogaster TEs.

(A) Enrichment of H3K9me2 is also observed at sequences adjacent to euchromatic TEs in D. simulans. (B) Compared to insertions of the same TE family in D. melanogaster, insertions in D. simulans are more likely to show epigenetic effects (proportion of TE spread) and a larger increase in relative H3K9me2 fold enrichment in adjacent sequences. FE: fold enrichment, D. mel: D. melanogaster, D. sim: D. simulans.

https://doi.org/10.7554/eLife.25762.022

To compare the epigenetic effects of TEs in D. melanogaster and D. simulans, the H3K9me2 enrichment at sequences flanking TEs were estimated relative to the median fold enrichment level at sequences 20–40 kb away from TEs in both species (see Materials and methods). Current TE annotations are D. melanogaster-centric and it is likely that our analysis missed TE families and/or variants that are D. simulans-specific. Accordingly, we restricted comparisons to TE families that have at least two insertions in both species. While there are no significant between-species differences in the extent of H3K9me2 spread (paired MWU test, p=0.67), TE families in D. simulans display a larger proportion of TEs with epigenetic effects (paired MWU test, p=0.013), and a larger increase in relative H3K9me2 fold enrichment (paired MWU test, p=0.035; Figure 8B). It is worth noting that only high confidence TE calls were included in the analysis, and only 14 families had at least two copies in both species. The roo family has the highest proportion of TEs with epigenetic effects in both species (94.4% and 86.5% in D. melanogaster and D. simulans, respectively), while the 1360 family has the largest differences between the two species (14.3% and 77.8% in D. melanogaster and D. simulans, respectively). These results demonstrate that TEs in D. simulans exhibit stronger epigenetic effects on flanking sequences compared to D. melanogaster.

Variation in genetic modifiers of PEV correlates with differences in the epigenetic effects of TEs between species

The extent of heterochromatin-mediated gene silencing (e.g. PEV) depends on several genetic modifiers, in particular the amount of heterochromatic DNA in a genome (reviewed in [Girton and Johansen, 2008]), and the dosage of several Su(var) and E(var) genes, whose wildtype proteins enhance and weaken PEV respectively (Elgin and Reuter, 2013; Swenson et al., 2016). The prevailing model is that altering the ratio of heterochromatin targets (heterochromatic DNAs) and regulators (Su(var) and E(var) proteins) influences heterochromatin nucleation and spreading (Locke et al., 1988). For example, lower amounts of heterochromatic DNA result in increased levels of heterochromatic Su(var) proteins in other regions, and accordingly enhance PEV. Because both PEV and the epigenetic effects of TEs are mediated through spreading of the same repressive epigenetic marks (H3K9me2/3 and HP1a), the epigenetic effects of TEs may depend on similar PEV modifiers. Indeed, a limited survey using reporter constructs demonstrated that the epigenetic effects of TEs depend on the expression of HP1a (Su(var)205) and Su(var)3–9, which binds and catalytically generates H3K9me2/3 marks, respectively (Sentmanat and Elgin, 2012). We thus predicted that variation in the epigenetic effects of TEs within and between species, and accordingly genomic abundance of TE insertions, could be due to differences in the amounts of heterochromatic DNA and/or modifier proteins.

We investigated the hypothesis that stronger epigenetic effects of TEs in D. simulans are associated with lower amounts of heterochromatic DNA or altered expression of Su(var)s/E(var)s. In Drosophila, heterochromatic DNA consists of simple repeats and degenerate TEs (Hoskins et al., 2007, 2015). We first identified short repeats (12-mers) that are enriched in heterochromatic regions by performing K-mer analysis of H3K9me2 ChIP-Seq data (see above), and then quantified the amount of identified H3K9me2-enriched 12-mers in these two species using published genomic sequencing data ((Kofler et al., 2015), see Materials and methods). Consistent with previous quantitation of simple repeat content using orthogonal approaches (melting curves [Lohe and Brutlag, 1987] or flow cytometry [Bosco et al., 2007]), we observed lower amounts of H3K9me2-enriched simple repeats in D. simulans compared to D. melanogaster (Figure 9A, ANOVA p-value=0.00013 (species) and <10−6 (library preparation method)).

Figure 9 with 1 supplement see all
Variation in genetic modifiers of PEV in D. melanogaster and D. simulans.

(A) D. simulans has higher normalized amounts of H3K9me2-enriched 12-mers than D. melanogaster. Raw amounts of H3K9me2-enriched 12-mers were normalized with sequencing coverage in each sample before comparisons (see Materials and methods). Different library preparation methods (see [Kofler et al., 2015]) are denoted with dots of different colors. (B) Compared to genome-wide distributions (shaded gray), known Su(var) genes as a group (orange, 40 genes in total) have higher expression in D. simulans than in D. melanogaster. Positive z-score represents lower expression rank (i.e. higher expression) in D. simulans than in D. melanogaster. Dashed vertical lines represent the top and bottom 5% of transcript level differences genome-wide. (C) Z-score for differences in transcript levels of ten known dosage-dependent E(var) genes (green), Su(var) genes (orange), and histone methyltransferase genes (also Su(var)s) between D. melanogaster and D. simulans are denoted as vertical lines and compared to genome-wide distributions (shaded gray). Dashed vertical lines indicate top and bottom 5% of transcript level differences genome-wide.

https://doi.org/10.7554/eLife.25762.026

To test if the expression of Su(var)s and E(var)s varies between the two species, we estimated z-scores for between-species differences in expression rank (high expression – low rank), using D. simulans as reference (i.e. positive z-socre: higher expression in D. simulans; negative z-score: higher expression in D. melanogaster). Compared to other genes in the genome, 40 known Su(var)s, as a group, have higher expression in D. simulans than in D. melanogaster (Kolmogorov–Smirnov test, p<10−6, Figure 9B; also see Figure 9—figure supplement 1 and Figure 9—source data 1 for individual genes included in the analysis). The small number of known E(var)s (five) precluded us from drawing any solid conclusions (Figure 9—figure supplement 1). Several Su(var)s/E(var)s are known to have dosage-dependent effects on heterochromatin silencing (Elgin and Reuter, 2013; Swenson et al., 2016). Among these dosage-dependent Su(var)s/E(var)s, Su(var)3–9 showed significantly higher expression in D. simulans than in D. melanogaster (Figure 9C). Overall, we found that D. simulans has lower amounts of H3K9me2-enriched simple repeats and higher expression of Su(var)s compared to D. melanogaster, both of which could account for the stronger epigenetic effects of TEs observed in D. simulans.

Discussion

Despite the presence of TEs in virtually all eukaryotic genomes surveyed, there is wide variation in euchromatic TE content among species, demonstrated by significant differences in copy number (Clark et al., 2007; Biémont, 2010; Chalopin et al., 2015), frequency spectra (Lockton and Gaut, 2010; Agren et al., 2014; Kofler et al., 2015), predominant types of TEs (Vieira and Biémont, 2004; Chalopin et al., 2015; Kofler et al., 2015), and species-specific TE families (Daniels et al., 1990; Lohe et al., 1995; Mills et al., 2006). Understanding the causes for such variation is critical for evaluating the impacts of various evolutionary forces on the population dynamics of TEs. Selection against the deleterious effects of TEs has been theoretically proposed (Charlesworth and Charlesworth, 1983) and empirically supported (reviewed in [Charlesworth and Langley, 1989; Lee and Langley 2010; Barrón et al., 2014]) as a dominant force restricting the selfish increase of TEs, and shaping variation of TEs within and between species. Differences in effective population size ([Lynch and Conery, 2003; Lockton et al., 2008], but see [Charlesworth and Barton, 2004; Groth and Blumenstiel, 2017]), mating systems (Charlesworth and Charlesworth, 1995; Wright and Schoen, 1999; Dolgin et al., 2008; Agren et al., 2014), and modes of reproduction (e.g. asexual vs sexual, [Arkhipova and Meselson, 2000]), were suggested to influence the efficacy of selection against TEs, and thus result in divergence in euchromatic TE content between species. In addition, differential exposure to opportunities for horizontal transfer might also contribute to variation in TE profiles (Clark et al., 2007; Groth and Blumenstiel, 2017).

In this study, we analyzed the epigenetic influence of TEs and investigated the role of such effects in the population dynamics of TEs. By comparing the epigenome of two D. melanogaster strains with divergent TE insertion positions, we conclude that the enrichment of repressive epigenetic marks around euchromatic TEs is due to the presence of TEs, instead of the preferential insertions of TEs into genomic regions already enriched for repressive epigenetic marks. Surprisingly, quantification of the epigenetic effects of individual TE insertions in D. melanogaster revealed that more than half of the euchromatic TEs analyzed are associated with at least 1 kb spread of repressive epigenetic marks, with an average spread of 4.5 kb from TEs that display epigenetic effects. In contrast, repressive DNA methylation from epigenetically silenced TEs in A. thaliana predominantly spreads only a few hundred base pairs (Quadrana et al., 2016; Stuart et al., 2016). Since ~20% of the euchromatic genes are within 5 kb from at least one TE insertion in the reference D. melanogaster strain (Kaminker et al., 2002; Quesneville et al., 2005; Hoskins et al., 2015), these estimates suggest that the epigenetic influence of TEs on functional sequences is extensive in D. melanogaster. Indeed, we observed strong positive associations between the presence of TEs with epigenetic effects and the enrichment of repressive epigenetic marks in adjacent genic alleles, when compared to homologous alleles lacking a neighboring TE insertion. This concurs with observations in Arabidopsis that TEs contribute significantly to genic DMRs (differential methylated regions) between genomes (Schmitz et al., 2013; Quadrana et al., 2016; Stuart et al., 2016). In addition, we found substantial variation in the epigenetic effects among TE families, which could be due to differences in TE types, targeting by piRNAs, and/or the abundance of TE families.

It was proposed that heterochromatin assembly depends on the concentration of essential heterochromatic enzymatic and structural proteins, whose concentration is highest in heterochromatin and decreases with increasing distance (‘mass action model’, [Locke et al., 1988]). This model can explain the spread of repressive epigenetic marks from pericentromeric or subtelomeric heterochromatin to juxtaposed euchromatic genes in PEV (reviewed in [Girton and Johansen, 2008; Elgin and Reuter, 2013]), and likely the enrichment of repressive epigenetic marks at sequences adjacent to euchromatic TEs. Consistent with predictions of this model, previous (Lee 2015) and current analyses both found that H3K9me2/3 enrichment decays with distance from TEs. The enrichment of repressive epigenetic marks could also result from TE-induced formation of de novo piRNA-generating loci that include TE-flanking sequences (Shpiz et al., 2014). piRNAs from these piRNA-generating loci would accordingly initiate epigenetic silencing of TE-flanking euchromatic sequences.

TE-induced reduction in expression of neighboring genes has been demonstrated in Drosophila (Cridland et al., 2015; Lee 2015), and the spreading of repressive epigenetic marks from TEs is a plausible mechanism for such observations. We observed an excess of TE-flanking alleles with higher H3K9me2 enrichment and lower transcript levels than homologous alleles lacking nearby TE insertions. However, there are apparent exceptions to this pattern, such as alleles adjacent to TEs having higher enrichment of H3K9me2 but also higher expression levels than homologous alleles lacking neighboring TEs (e.g. arrows in Figure 7A). In addition to the epigenetic effects of TEs, cis-regulatory sequences contained within TEs could interfere with gene regulation, leading to increased as well as reduced gene expression (Naito et al., 2009; Batut et al., 2013). In fact, recent studies that jointly analyzed mobilomes and transcriptomes in A. thaliana populations found that TE insertions result in equal frequencies of increased and decreased expression of flanking genes (Quadrana et al., 2016; Stuart et al., 2016). Besides, SNPs and copy number variants (CNVs) in regulatory sequences are also identified as significant contributors to differential expression between homologous alleles (Massouras et al., 2012). The observed variation in transcripts level is thus the joint consequence of TE’s epigenetic effects and other genetic factors. Importantly, there are known exceptions to the association between higher enrichment of heterochromatic marks and lower gene expression. In fact, enrichment of repressive epigenetic marks and heterochromatin proteins is surprisingly high for active genes normally located in the pericentromeric heterochromatin, and is required for the proper expression of these genes (Wakimoto and Hearn, 1990; Hearn et al., 1991; Yasuhara and Wakimoto, 2008; Riddle et al., 2011). For some of these genes, the association with a local heterochromatin environment was even observed for homologs located in euchromatic regions of other Drosophila species (Caizzi et al., 2016).

Importantly, our quantifications of the epigenetic effects of TEs and the associated functional consequences on gene expression are likely underestimates of the real effects in natural populations. Screens for genetic mutations repeatedly report an inverse correlation between the dominance of a deleterious allele and its fitness effect (reviewed in [Simmons and Crow, 1977; Wilkie, 1994; Osada et al., 2009]). If similar trends apply to the deleterious epigenetic effects of TEs, establishment of the wild-derived inbred Drosophila strains used here would have removed the majority of TEs with lethal or sub-lethal epigenetic effects (i.e. substantial functional consequences). The small sample size used here (two strains) precluded detecting the subtle functional consequences of TE-induced epigenetic effects that are expected to be prevalent in inbred strains. Future larger scale epigenomic and transcriptomic profiling of multiple, diverse population samples would be necessary to further investigate the functional consequence of TE’s epigenetic effects. It is also worth noting that not every gene included in the analysis is expressed at the embryonic stage studied (4–8 hr embryo). In fact, we observed a deficiency in the number of genes that both have adjacent TEs and are expressed in 4–8 hr embryos (using data from modEncode (Graveley et al., 2011); Fisher’s Exact Test, p=0.00017, odds ratio = 0.73). This deficiency is even stronger when only considering TEs with epigenetic effects (Fisher’s Exact Test, p<10−5, odds ratio = 0.58). Epigenetic marks established at embryonic stages are expected to influence both somatic and germline cells, and were experimentally demonstrated to have a long lasting functional effect through development (Gu and Elgin, 2013). Accordingly, our observations are consistent with selection preferentially removing TEs that result in spreading of repressive epigenetic marks to adjacent genes expressed in embryonic stages.

Compared to neutral TE insertions that confer no fitness effects, TEs exerting deleterious fitness effects are more strongly selected against and should appear rare in populations (reviewed in [Charlesworth and Langley, 1989; Lee and Langley 2010; Barrón et al., 2014]). Consistent with the prediction that the epigenetic effects of TEs have deleterious fitness consequences, we found that TE insertions with stronger epigenetic effects have lower population frequencies, demonstrating the importance of such effects in the population dynamics of TEs. In addition, theoretical work suggests that stable equilibrium of TE copy number in an outbreeding, meiotically recombining population requires synergistic epistasis of the deleterious effects of TEs; specifically host fitness must decrease faster than linear with respect to increases in TE copy number (Charlesworth and Charlesworth, 1983). However, despite being critical to explaining the population dynamics of TEs, synergistic epistasis has only been empirically supported for deleterious effects mediated by ectopic recombination between nonhomologous TE insertions (Langley et al., 1988; Petrov et al., 2003, 2011), one of the many proposed deleterious genetic mechanisms for TEs (reviewed in [Lee and Langley 2010]). Because piRNAs are generated through feed-forward cycles that involve TE transcripts (Gunawardane et al., 2007; Brennecke et al., 2007), we previously predicted that the deleterious epigenetic effects of TEs would confer the theoretically required synergistic epistasis for stable containment of TEs (Lee and Langley 2010; Lee 2015). The observed association between the abundance of a TE family and the propensity of its members to influence the epigenetic states of adjacent sequences supports this prediction, and further extends possible evolutionary mechanisms for stable containment of TE copy number in host populations.

An especially interesting observation is the stronger epigenetic effects of TEs in D. simulans compared to D. melanogaster. TE insertions in D. simulans are more likely to show spreading of H3K9me2 and result in larger increase in H3K9me2 enrichment compared to those of the same TE family in D. melanogaster. All else being equal, the stronger epigenetic effects should lead to stronger selection removing TE insertions in D. simulans. If the rate of TE proliferation is also similar in these two species, this could account for the lower genomic TE content (Clark et al., 2007) and fewer TE insertions (Dowsett and Young, 1982; Vieira et al., 1999; Vieira and Biémont, 2004; Kofler et al., 2015) in D. simulans compared to D. melanogaster. Our observations complement previous comparisons of A. thaliana and A. lyrata, which revealed negative associations between genomic TE content and the effectiveness of siRNA targeting (Hollister et al., 2011). Overall, these results strongly support the conclusion that variation in the epigenetic effects of TEs contributes to the divergent TE content observed between even closely related species and significantly impacts TE evolution.

Finally, we attempted to gain insights into the molecular mechanisms that could account for the between-species differences in the epigenetic effects of TEs. We observed lower amounts of heterochromatic DNA and higher expression of Su(var) genes in D. simulans, both of which are known to generate more extensive spread of repressive epigenetic marks from constitutive heterochromatin in D. melanogaster. The amount of heterochromatic DNA was among the first identified dosage-dependent PEV modifiers ([Dimitri and Pisano, 1989], reviewed in [Girton and Johansen, 2008]). Similarly, Su(var) genes were identified through mutants that suppress PEV, demonstrating that the wild type genes play positive roles in heterochromatin establishment and/or maintenance (reviewed in [Girton and Johansen, 2008; Elgin and Reuter, 2013]). In particular, Su(var)3–9, which encodes the H3K9 methyltransferase, displays significantly higher expression in D. simulans than in D. melanogaster, and its between-species difference in expression level ranks in the top 0.75% genome-wide (see Results). H3K9 methylation is critical for suppression of TE expression and transposition (Penke et al., 2016), and Su(var)3–9 mutations reduce the epigenetic effects of TEs on adjacent reporter genes (Sentmanat and Elgin, 2012). Furthermore, Su(var) 3–9 is a haploid suppressor and triploid enhancer of PEV (Schotta et al., 2002), suggesting that changes in transcript levels of Su(var)3–9 would result in quantitative differences in the epigenetic effects of TEs. Assuming that the epigenetic effects of TEs depend on the amount of heterochromatic DNA and the expression of Su(var) genes in D. melanogaster and D. simulans similarly, variation in these two genetic PEV modifiers provides a viable explanation for the observed differences in the epigenetic effects of TEs and more broadly divergent TE profiles between these two species.

Our observations support the hypothesis that the host genetic environment contributes to the extent of deleterious epigenetic effects of TEs and influences the population dynamics of TEs, pointing towards a rarely addressed mechanism for the widely observed variation of TEs. Furthermore, PEV is long known to be temperature sensitive (Gowen and Gay, 1933), and several abiotic factors influence heterochromatin function (Seong et al., 2012; Silver-Morse and Li, 2013). Thus, different environmental conditions present in diverse habitats could also contribute to variation in the epigenetic effects of TEs and the widely divergent TE profiles within and between species.

The observed significant variation in genetic PEV modifiers between D. melanogaster and D. simulans raises questions about its evolutionary causes. It is worth noting that the deleterious epigenetic effects of TEs are considered as a side effect of host-directed epigenetic silencing of TEs (Hollister and Gaut, 2009; Lee 2015), and direct positive selection for stronger epigenetic effects of TEs would be unlikely to explain the between-species differences in genetic PEV modifiers. Elevated transcript levels of Su(var) genes might have been selected for to silence burst expansions of specific TE families or other types of repetitive sequences. Alternatively, Su(var) genes are highly pleiotropic (reviewed in [Girton and Johansen, 2008; Eissenberg and Reuter, 2009; Elgin and Reuter, 2013]), and selection might have acted instead on their essential chromosomal functions, with varying influence on the epigenetic consequences of TEs as a secondary effect. Similarly, changes in the amounts of heterochromatic DNA could have resulted from selfish expansion of repetitive sequences (Charlesworth et al., 1994) and/or global changes in chromatin landscapes due to karyotype turnover (Kaiser and Bachtrog, 2010; Vicoso and Bachtrog, 2013). Our findings suggest that the evolution of TEs may be more tightly associated with the evolution of other cellular, chromosomal, and/or genetic processes than previously appreciated.

Materials and methods

Drosophila strains

Drosophila strains used in this study are D. melanogaster RAL315 (Bloomington Drosophila stock center (BDSC) #25181), RAL360 (BDSC #25186), and D. simulans w501 (Drosophila species stock center). Previous analysis showed that these two D. melanogaster inbred wildtype strains have low residual heterozygosity (Lack et al., 2015). Flies were cultured on standard medium at 25C, 12 hr light/12 hr dark cycles.

ChiP-Seq and RNA-Seq experiments

Before collecting embryos, mated flies were allowed to lay eggs on fresh apple juice agar plates for one hour. Embryos were then collected on fresh apple juice agar plates for 4 hr and aged for 4 hr (to enrich for 4–8 hr embryos). All fly rearing and embryo collections were performed at 25C. Chromatin isolation and immunoprecipitation were performed following the modEncode protocol (http://www.modencode.org/). The antibody used for H3K9me2 (abcam 1220) was validated by modEncode and showed high consistency between lots (Egelhofer et al., 2011). For each strain, there were at least two replicates and each IP replicate had a matching input. ChIP-Seq libraries were prepared with NuGen Ovation Ultralow Library Systems V2 (San Carlos, CA) and sequenced on Illumina Hi-Seq with 100 bp, paired-end reads. RNAs were extracted from embryos that were collected using the same procedures using the RNeasy Plus kit (Qiagen). There were two replicates for each strain. RNA-Seq libraries were prepared using Illumina TruSeq and sequenced on Illumina Hi-Seq with 100 bp, paired-end reads.

TE calls

We used highly conservative euchromatin-heterochromatin boundaries: 0.5 Mb distal from those reported previously for D. melanogaster (Riddle et al., 2011). For D. simulans, we used boundaries that are 0.5 Mb distal from the sharp transition in H3K9me2 enrichment, based on our ChIP-Seq data. For all the analyses reported, we excluded TEs, genes, and sequences in heterochromatic regions. For D. melanogaster strains, we used TE insertions reported with strong confidence (coverage ratio greater than or equal to 3; [Rahman et al., 2015]). TEs that are shared between two RAL strains, in shared H3K9me2 peaks in euchromatin (called by MACS2 and present in both strains, see below), and/or in exons were also excluded. TEs in the D. simulans genome were annotated according to (Chiu et al., 2013), using blastn (Camacho et al., 2009). In brief, we used the blast hit with smallest e-value and excluded a putative insertion when the blast hit had the same smallest e-value for more than one TE family. We required a putative TE call to have at least 100 bp, at least 80% identity to canonical TEs, and merged TE calls of the same family and within 500 bp. TEs of different families but were within 2 kb were called as putative TE clusters and excluded from the analysis. In both species, we excluded INE-1 TEs, most which are relicts of a TE family that experienced an ancient burst of transposition events and are now mostly fixed in populations (Kapitonov and Jurka, 2003; Singh and Petrov, 2004). Our study included 255 TEs for the Oregon-R strain, 419 TEs for RAL strains, and 349 TEs for the D. simulans strain.

ChIP-Seq data analysis

Raw reads were processed with trim-galore (‘Babraham Bioinformatics - Trim Galore!”) to remove adaptors and low quality sequences. Processed reads were mapped to release six reference D. melanogaster genome (Hoskins et al., 2015) or release two reference D. simulans genome (Hu et al. 2013), using bwa mem with default parameters (v 0.7.5) (Li and Durbin, 2009). Reads with mapping quality score lower than 30 were filtered using samtools (Li, 2011) and excluded from further analysis. We used Macs2 with a liberal significance threshold (p=0.2) to generate peak calls for IDR (irreproducible rate) analysis (Li et al., 2011), which evaluates the reproducibility of ChIP replicates. Replicates for our samples had low IDRs (Figure 2—figure supplement 3 and Figure 8—figure supplement 2), and were combined to generate a single H3K9me2 fold enrichment track (between IP and matching input) for each sample. Our analyses were based these fold-enrichment tracks.

The baseline H3K9me2 enrichment level is slightly different between the two D. melanogaster strains, potentially due to technical and/or biological reasons. As the enrichment of repressive epigenetic marks is generally confined to 10 kb from TEs (Lee 2015), we used the H3K9me2 enrichment levels 20–40 kb upstream and downstream of each TE insertion to normalize the background levels between the two strains. For each annotated TE insertion, we divided its flanking 20 kb upstream and 20 kb downstream sequences into 20 nonoverlapping 1 kb windows respectively (Figure 2—figure supplement 1). We then used Mann-Whitney U test to assess if H3K9me2 enrichment in the ith upstream and downstream windows differs significantly between the two strains. The most distant windows considered are 20 kb from TE insertions. The ‘extent of H3K9me2 spread’ is the farthest windows in which the H3K9me2 enrichment is consecutively and significantly higher in the strain with TE. When the farthest windows are different between the left and right sides of a TE insertion, we used the window closer to TE for the ‘extent of H3K9me2 spread’ (to be conservative). The ‘% increase of H3K9me2’ is the difference of median H3K9me2 enrichment between the two strains in the 0–1 kb windows immediately next to TEs (with TE strain minus without TE strain), divided by the enrichment level for the strain without TE.

For D. simulans TEs, we calculated relative fold enrichment with respect to the median H3K9me2 fold enrichment at flanking 20–40 kb upstream or downstream sequences, whichever had a higher median (to be conservative). D. melanogaster data were also analyzed using this method to allow between-species comparisons. We again used Mann-Whitney U test to assess if the relative H3K9me2 enrichment in a window is significantly higher than one, the background level of relative fold enrichment. Here, the ‘extent of spread’ is the farthest window in which the relative fold enrichment is consecutively and significantly higher than one. The ‘increase in fold enrichment’ is the median relative fold enrichment in the 0–1 kb window immediately next to TE, minus one. To evaluate the performance of this method, we compared D. melanogaster results using this method to those based on normalization between strains. We found significant correlations between the two approaches for indexes of TE’s epigenetic effects (Spearman rank ρ = 0.63 (extent of H3K9me2 spread) and 0.68 (increase in H3K9me2 enrichment), p<10−16 for both). The calls for the presence of epigenetic effects (extent of spread at least 1 kb) were consistent between the two methods for 73.3% of TEs. Among TEs with inconsistent results, 67.8% (18.1% of all TEs) were called as ‘no epigenetic effect’ by the single-genome method but ‘with epigenetic effect’ by the method that incorporate both strains, suggesting that the single-genome method is overall more conservative in estimating the epigenetic effects of TEs. For 80% of the TEs, the estimated extents of spread were either the same or differ within 2 kb between the two methods.

We estimated the percentage of sites annotated as simple repeats in a 10 kb window around each TE insertion (based on the repeat-masked release 6 D. melanogaster genome from https://genome.ucsc.edu/). Recombination rate estimates for TE insertions were interpolated from (Comeron et al., 2012), which reported average recombination rate of D. melanogaster in 1 Mb window. For TE-family level analysis, we only considered TE families with at least two observations. Abundance of each TE family is based on (Kofler et al., 2015). Ovarian piRNA sequences for two wildtype strains (w1118 and wK) were from (Brennecke et al., 2008; Kelleher et al., 2012), and the normalized count estimates of each TE family were from (Kelleher and Barbash, 2013). We used two endo-siRNA datasets: (1) the reported counts of endo-siRNA (excluded pre-microRNAs) in adult heads for each TE family (Ghildiyal et al., 2008), and (2) endo-siRNAs generated by Ago2 pull-down libraries from ovaries (Czech and Hannon, 2011). The raw endo-siRNA sequences were processed with trim-galore, mapped with bwa aln to all annotated TEs in D. melanogaster reference genome, and counted for each TE family. For both piRNAs (from [Kelleher and Barbash, 2013]) and siRNAs, those that mapped to more than one TE families were excluded from the analysis.

For gene-based analysis, we calculated average fold enrichment over gene bodies for each replicate and used quantile-normalization. We calculated a z-score for each gene (mean H3K9me2 enrichment of allele with nearby TE – mean H3K9me2 enrichment of allele lacking nearby TE)/(mean standard deviation of both strains). Our analysis excluded genes with ambiguous TE presence/absence status (such as a gene with TE within 2 kb in one strain but with TE within 5 kb in the other strain). We used D. melanogaster annotation 6.07 and D. simulans annotation 2.01.

ChIP-Seq data from Oregon-R were downloaded from the modEncode website (http://www.modencode.org/) and analyzed with the same procedures.

RNA-seq analysis

Raw reads were processed with trim-galore, followed by mapping to release six reference D. melanogaster genome (Hoskins et al., 2015) or release two reference D. simulans genome (Hu et al. 2013) using TopHat with default parameters (Trapnell et al., 2009). We used htseq-count (Anders et al., 2015) to count the number of reads mapping to exons and used DESeq2 (Love et al., 2014) to normalize and estimate expressional fold change between the two D. melanogaster strains. Estimates of transcript abundance were highly correlated between biological replicates (Pearson’s r = 0.98 (RAL315), 0.97 (RAL360), and 0.88 (D. simulans), p<10−16 for all). We only analyzed genes annotated as expressed in 4–8 hr embryos by the modEncode developmental time course study (Graveley et al., 2011). Indeed, genes annotated as no or extremely low expression in 4–8 hr embryos in the modEncode study have much fewer mapped reads than other genes in our RNA-seq data (median for RPKM, RAL315: 0.058 (not expressed) vs 13.16 (other genes), RAL360: 0.031 (not expressed) vs 12.70 (other genes), Mann-Whitney U test, p<10−16 for both). To investigate the functional consequence of TE-induced enrichment of repressive epigenetic marks, we categorized protein-coding genes according to their epigenetic states (RAL 315 or RAL360 higher?) and RNA transcript levels (RAL 315 or RAL360 higher?) in the two strains. The proportion of genes with predicted TE-induced epigenetic states and RNA transcript levels (higher H3K9me2 enrichment and lower expression for alleles adjacent to TEs) for genes with TEs in 10 kb were compared to other genes in the genome using Fisher’s Exact Test (also see Figure 7).

To compare expression levels between the two species, we used RPKM (reads per kilobase per million reads) and ranked genes from highest (small rank) to lowest (large rank) expression in each library. Z-score was calculated as (mean rank of D. melanogaster – mean rank of D. simulans)/mean standard deviation. A negative z-score represents higher expression in D. melanogaster while a positive z-score represents higher expression in D. simulans.

TE population frequency analysis

Raw reads from Drosophila Population Genomic Project (DPGP) 3 (Lack et al., 2015) were mapped to release six D. melanogaster reference genome using bwa mem with default parameters. Reads mapping within 500 bp upstream or downstream of TE insertion sites were parsed out using samtools. Parsed reads were assembled using phrap (Ewing and Green, 1998) following parameters in (Cridland et al., 2013). Assembled contigs were aligned against repeat-masked release six D. melanogaster genome using blastn. If one of the contigs spanned over at least 50 bp on both sides of a TE insertion site, the TE was called absent in the analyzed genome. If no contigs spanned the TE insertion site, contigs were aligned against canonical TE sequences and sequences of all TEs in the reference D. melanogaster genome using blastn. When there were blast hits to TE sequences, a TE was called present if there was a contig aligning at least 30 bp left or right of the TE insertion site without spanning the insertion site. All other scenarios were called as missing data. For population frequency analysis, we only included TEs that have at least 100 alleles (out of 197 alleles) called in DPGP3 genomes.

A large proportion of the analyzed TE insertions (68.5%) has zero population frequencies in the Zambian population (Figure 5—figure supplement 1). Accordingly, in some analyses, we also categorized TEs into those that are present in the Zambian population (‘high frequency’ TEs) and those that are absent (‘low frequency’ TEs). To account for the influence of TE family identity on TE’s population frequencies, we performed regression analysis using generalized linear model and generalized mixed linear model. Population frequencies of TE insertions (response variable) were treated as either dichotomous variable (‘high frequency’ TE or not) or count (the number of individuals in which a TE insertion is present). Because the distribution of TE count is overdispersed (i.e. the variance is greater than the mean), we modeled the TE count as having either ‘quasipoission’ or ‘negative binomial’ distribution. The influence of TE family identity was modeled as either fixed effect (generalized linear model) or random effect (generalized mixed linear model). The two indexes for the epigenetic effects of TEs (‘extent of H3K9me2 spread’ and ‘% increase in H3K9me2 enrichment’) were analyzed separately. Regression models used were:

logit p  TEs epigenetic effects (either "extent of H3K9me2 spread" or"% increase in H3K9me2 enrichment")+familyTE count  TEs epigenetic effects+family

where logit p is the log odds of whether a TE is observed in the Zambia population (‘high frequency’ TEs). We used MASS (Venables and Ripley, 2002) for negative binomial regression and lme4 (Bates et al., 2015) for generalized mixed linear models in R.

Heterochromatic repeat content

Identifying heterochromatic repeats

To identify repeats enriched in heterochromatic regions, we used KMC2 (Deorowicz et al., 2015) to quantify the number of 12-mers in IP and Input libraries from the H3K9me2 ChIP-seq experiments. To normalize between sequencing libraries, we divided the counts of 12-mers by the number of reads that mapped uniquely to the reference genome with at least 30 mapping quality score. The idea is to have a measure of the abundance of repetitive sequence relative to the single-copy regions of the genome. To find 12-mers enriched with H3K9me2, we divided the normalized counts from IP libraries by the normalized counts in corresponding Input libraries, and considered 12-mers with at least 1.5 fold enrichment with H3K9me2 as enriched in heterochromatic regions.

Quantifying the amount of heterochromatic repeats

We used pooled-genome sequencing of D. melanogaster and D. simulans data from (Kofler et al., 2015) to quantify the amount of heterochromatic repeats in these two species. It is worth noting that in (Kofler et al., 2015), libraries of these two species were prepared and sequenced in pairs, which minimized technical variations. We used KMC2 with the same parameters to count the number of H3K9me2 enriched 12-mers in each library. To account for the different completeness of reference genomes for D. melanogaster and D. simulans, and the variation in sequencing coverage between samples, we counted the number of reads mapped uniquely and with at least 30 mapping quality score to Flybase annotated orthologous exonic regions. Numbers of H3K9me2 enriched 12-mers were then normalized with sequencing coverage in these orthologous exonic regions. Results using different fold enrichment thresholds to identify H3K9me2 enriched 12-mers or using different normalization metrics gave consistent results (Table 3).

Table 3

Comparisons of H3K9me2 enriched 12-mers between D. melanogaster and D. simulans using different normalization and thresholds. Raw counts of H3K9me2 enriched 12-mers were normalized by read coverage of either the orthologous exonic regions or all orthologous genomic regions. ‘Fold enrichment threshold’ is the threshold for a 12-mer to be considered as H3K9me2 enriched in the ChIP-Seq data. ‘% of 12-mers’ is the proportion of H3K9me2 enriched 12-mers among all 12-mers.

https://doi.org/10.7554/eLife.25762.029
ANOVA p-value
normalizationfold enrichment threshold% of total 12-mersSpecieslibrary preparation method
exon reads1.520.21%1.29E-041.43E-07
exon reads212.89%8.62E-051.28E-07
exon reads36.41%1.01E-021.55E-07
all reads1.520.21%2.07E-092.40E-05
all reads212.89%5.14E-111.70E-05
all reads36.41%1.60E-031.20E-02

References

  1. 1
    The genome sequence of Drosophila Melanogaster
    1. MD Adams
    2. SE Celniker
    3. RA Holt
    4. CA Evans
    5. JD Gocayne
    6. PG Amanatides
    7. SE Scherer
    8. PW Li
    9. RA Hoskins
    10. RF Galle
    11. RA George
    12. SE Lewis
    13. S Richards
    14. M Ashburner
    15. SN Henderson
    16. GG Sutton
    17. JR Wortman
    18. MD Yandell
    19. Q Zhang
    20. LX Chen
    21. RC Brandon
    22. YH Rogers
    23. RG Blazej
    24. M Champe
    25. BD Pfeiffer
    26. KH Wan
    27. C Doyle
    28. EG Baxter
    29. G Helt
    30. CR Nelson
    31. GL Gabor
    32. JF Abril
    33. A Agbayani
    34. HJ An
    35. C Andrews-Pfannkoch
    36. D Baldwin
    37. RM Ballew
    38. A Basu
    39. J Baxendale
    40. L Bayraktaroglu
    41. EM Beasley
    42. KY Beeson
    43. PV Benos
    44. BP Berman
    45. D Bhandari
    46. S Bolshakov
    47. D Borkova
    48. MR Botchan
    49. J Bouck
    50. P Brokstein
    51. P Brottier
    52. KC Burtis
    53. DA Busam
    54. H Butler
    55. E Cadieu
    56. A Center
    57. I Chandra
    58. JM Cherry
    59. S Cawley
    60. C Dahlke
    61. LB Davenport
    62. P Davies
    63. B de Pablos
    64. A Delcher
    65. Z Deng
    66. AD Mays
    67. I Dew
    68. SM Dietz
    69. K Dodson
    70. LE Doup
    71. M Downes
    72. S Dugan-Rocha
    73. BC Dunkov
    74. P Dunn
    75. KJ Durbin
    76. CC Evangelista
    77. C Ferraz
    78. S Ferriera
    79. W Fleischmann
    80. C Fosler
    81. AE Gabrielian
    82. NS Garg
    83. WM Gelbart
    84. K Glasser
    85. A Glodek
    86. F Gong
    87. JH Gorrell
    88. Z Gu
    89. P Guan
    90. M Harris
    91. NL Harris
    92. D Harvey
    93. TJ Heiman
    94. JR Hernandez
    95. J Houck
    96. D Hostin
    97. KA Houston
    98. TJ Howland
    99. MH Wei
    100. C Ibegwam
    101. M Jalali
    102. F Kalush
    103. GH Karpen
    104. Z Ke
    105. JA Kennison
    106. KA Ketchum
    107. BE Kimmel
    108. CD Kodira
    109. C Kraft
    110. S Kravitz
    111. D Kulp
    112. Z Lai
    113. P Lasko
    114. Y Lei
    115. AA Levitsky
    116. J Li
    117. Z Li
    118. Y Liang
    119. X Lin
    120. X Liu
    121. B Mattei
    122. TC McIntosh
    123. MP McLeod
    124. D McPherson
    125. G Merkulov
    126. NV Milshina
    127. C Mobarry
    128. J Morris
    129. A Moshrefi
    130. SM Mount
    131. M Moy
    132. B Murphy
    133. L Murphy
    134. DM Muzny
    135. DL Nelson
    136. DR Nelson
    137. KA Nelson
    138. K Nixon
    139. DR Nusskern
    140. JM Pacleb
    141. M Palazzolo
    142. GS Pittman
    143. S Pan
    144. J Pollard
    145. V Puri
    146. MG Reese
    147. K Reinert
    148. K Remington
    149. RD Saunders
    150. F Scheeler
    151. H Shen
    152. BC Shue
    153. I Sidén-Kiamos
    154. M Simpson
    155. MP Skupski
    156. T Smith
    157. E Spier
    158. AC Spradling
    159. M Stapleton
    160. R Strong
    161. E Sun
    162. R Svirskas
    163. C Tector
    164. R Turner
    165. E Venter
    166. AH Wang
    167. X Wang
    168. ZY Wang
    169. DA Wassarman
    170. GM Weinstock
    171. J Weissenbach
    172. SM Williams
    173. T Woodage
    174. KC Worley
    175. D Wu
    176. S Yang
    177. QA Yao
    178. J Ye
    179. RF Yeh
    180. JS Zaveri
    181. M Zhan
    182. G Zhang
    183. Q Zhao
    184. L Zheng
    185. XH Zheng
    186. FN Zhong
    187. W Zhong
    188. X Zhou
    189. S Zhu
    190. X Zhu
    191. HO Smith
    192. RA Gibbs
    193. EW Myers
    194. GM Rubin
    195. JC Venter
    (2000)
    Science 287:2185–2195.
    https://doi.org/10.1126/science.287.5461.2185
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
    Fitting linear mixed-effects models using lme4
    1. D Bates
    2. M Mächler
    3. B Bolker
    4. S Walker
    (2015)
    Journal of Statistical Software, 67, 10.18637/jss.v067.i01.
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
    Transposable elements in inbreeding and outbreeding populations
    1. D Charlesworth
    2. B Charlesworth
    (1995)
    Genetics 140:415–417.
  29. 29
  30. 30
    Evolution of genes and genomes on the Drosophila phylogeny
    1. AG Clark
    2. MB Eisen
    3. DR Smith
    4. CM Bergman
    5. B Oliver
    6. TA Markow
    7. TC Kaufman
    8. M Kellis
    9. W Gelbart
    10. VN Iyer
    11. DA Pollard
    12. TB Sackton
    13. AM Larracuente
    14. ND Singh
    15. JP Abad
    16. DN Abt
    17. B Adryan
    18. M Aguade
    19. H Akashi
    20. WW Anderson
    21. CF Aquadro
    22. DH Ardell
    23. R Arguello
    24. CG Artieri
    25. DA Barbash
    26. D Barker
    27. P Barsanti
    28. P Batterham
    29. S Batzoglou
    30. D Begun
    31. A Bhutkar
    32. E Blanco
    33. SA Bosak
    34. RK Bradley
    35. AD Brand
    36. MR Brent
    37. AN Brooks
    38. RH Brown
    39. RK Butlin
    40. C Caggese
    41. BR Calvi
    42. A Bernardo de Carvalho
    43. A Caspi
    44. S Castrezana
    45. SE Celniker
    46. JL Chang
    47. C Chapple
    48. S Chatterji
    49. A Chinwalla
    50. A Civetta
    51. SW Clifton
    52. JM Comeron
    53. JC Costello
    54. JA Coyne
    55. J Daub
    56. RG David
    57. AL Delcher
    58. K Delehaunty
    59. CB Do
    60. H Ebling
    61. K Edwards
    62. T Eickbush
    63. JD Evans
    64. A Filipski
    65. S Findeiss
    66. E Freyhult
    67. L Fulton
    68. R Fulton
    69. AC Garcia
    70. A Gardiner
    71. DA Garfield
    72. BE Garvin
    73. G Gibson
    74. D Gilbert
    75. S Gnerre
    76. J Godfrey
    77. R Good
    78. V Gotea
    79. B Gravely
    80. AJ Greenberg
    81. S Griffiths-Jones
    82. S Gross
    83. R Guigo
    84. EA Gustafson
    85. W Haerty
    86. MW Hahn
    87. DL Halligan
    88. AL Halpern
    89. GM Halter
    90. MV Han
    91. A Heger
    92. L Hillier
    93. AS Hinrichs
    94. I Holmes
    95. RA Hoskins
    96. MJ Hubisz
    97. D Hultmark
    98. MA Huntley
    99. DB Jaffe
    100. S Jagadeeshan
    101. WR Jeck
    102. J Johnson
    103. CD Jones
    104. WC Jordan
    105. GH Karpen
    106. E Kataoka
    107. PD Keightley
    108. P Kheradpour
    109. EF Kirkness
    110. LB Koerich
    111. K Kristiansen
    112. D Kudrna
    113. RJ Kulathinal
    114. S Kumar
    115. R Kwok
    116. E Lander
    117. CH Langley
    118. R Lapoint
    119. BP Lazzaro
    120. SJ Lee
    121. L Levesque
    122. R Li
    123. CF Lin
    124. MF Lin
    125. K Lindblad-Toh
    126. A Llopart
    127. M Long
    128. L Low
    129. E Lozovsky
    130. J Lu
    131. M Luo
    132. CA Machado
    133. W Makalowski
    134. M Marzo
    135. M Matsuda
    136. L Matzkin
    137. B McAllister
    138. CS McBride
    139. B McKernan
    140. K McKernan
    141. M Mendez-Lago
    142. P Minx
    143. MU Mollenhauer
    144. K Montooth
    145. SM Mount
    146. X Mu
    147. E Myers
    148. B Negre
    149. S Newfeld
    150. R Nielsen
    151. MA Noor
    152. P O'Grady
    153. L Pachter
    154. M Papaceit
    155. MJ Parisi
    156. M Parisi
    157. L Parts
    158. JS Pedersen
    159. G Pesole
    160. AM Phillippy
    161. CP Ponting
    162. M Pop
    163. D Porcelli
    164. JR Powell
    165. S Prohaska
    166. K Pruitt
    167. M Puig
    168. H Quesneville
    169. KR Ram
    170. D Rand
    171. MD Rasmussen
    172. LK Reed
    173. R Reenan
    174. A Reily
    175. KA Remington
    176. TT Rieger
    177. MG Ritchie
    178. C Robin
    179. YH Rogers
    180. C Rohde
    181. J Rozas
    182. MJ Rubenfield
    183. A Ruiz
    184. S Russo
    185. SL Salzberg
    186. A Sanchez-Gracia
    187. DJ Saranga
    188. H Sato
    189. SW Schaeffer
    190. MC Schatz
    191. T Schlenke
    192. R Schwartz
    193. C Segarra
    194. RS Singh
    195. L Sirot
    196. M Sirota
    197. NB Sisneros
    198. CD Smith
    199. TF Smith
    200. J Spieth
    201. DE Stage
    202. A Stark
    203. W Stephan
    204. RL Strausberg
    205. S Strempel
    206. D Sturgill
    207. G Sutton
    208. GG Sutton
    209. W Tao
    210. S Teichmann
    211. YN Tobari
    212. Y Tomimura
    213. JM Tsolas
    214. VL Valente
    215. E Venter
    216. JC Venter
    217. S Vicario
    218. FG Vieira
    219. AJ Vilella
    220. A Villasante
    221. B Walenz
    222. J Wang
    223. M Wasserman
    224. T Watts
    225. D Wilson
    226. RK Wilson
    227. RA Wing
    228. MF Wolfner
    229. A Wong
    230. GK Wong
    231. CI Wu
    232. G Wu
    233. D Yamamoto
    234. HP Yang
    235. SP Yang
    236. JA Yorke
    237. K Yoshida
    238. E Zdobnov
    239. P Zhang
    240. Y Zhang
    241. AV Zimin
    242. J Baldwin
    243. A Abdouelleil
    244. J Abdulkadir
    245. A Abebe
    246. B Abera
    247. J Abreu
    248. SC Acer
    249. L Aftuck
    250. A Alexander
    251. P An
    252. E Anderson
    253. S Anderson
    254. H Arachi
    255. M Azer
    256. P Bachantsang
    257. A Barry
    258. T Bayul
    259. A Berlin
    260. D Bessette
    261. T Bloom
    262. J Blye
    263. L Boguslavskiy
    264. C Bonnet
    265. B Boukhgalter
    266. I Bourzgui
    267. A Brown
    268. P Cahill
    269. S Channer
    270. Y Cheshatsang
    271. L Chuda
    272. M Citroen
    273. A Collymore
    274. P Cooke
    275. M Costello
    276. K D'Aco
    277. R Daza
    278. G De Haan
    279. S DeGray
    280. C DeMaso
    281. N Dhargay
    282. K Dooley
    283. E Dooley
    284. M Doricent
    285. P Dorje
    286. K Dorjee
    287. A Dupes
    288. R Elong
    289. J Falk
    290. A Farina
    291. S Faro
    292. D Ferguson
    293. S Fisher
    294. CD Foley
    295. A Franke
    296. D Friedrich
    297. L Gadbois
    298. G Gearin
    299. CR Gearin
    300. G Giannoukos
    301. T Goode
    302. J Graham
    303. E Grandbois
    304. S Grewal
    305. K Gyaltsen
    306. N Hafez
    307. B Hagos
    308. J Hall
    309. C Henson
    310. A Hollinger
    311. T Honan
    312. MD Huard
    313. L Hughes
    314. B Hurhula
    315. ME Husby
    316. A Kamat
    317. B Kanga
    318. S Kashin
    319. D Khazanovich
    320. P Kisner
    321. K Lance
    322. M Lara
    323. W Lee
    324. N Lennon
    325. F Letendre
    326. R LeVine
    327. A Lipovsky
    328. X Liu
    329. J Liu
    330. S Liu
    331. T Lokyitsang
    332. Y Lokyitsang
    333. R Lubonja
    334. A Lui
    335. P MacDonald
    336. V Magnisalis
    337. K Maru
    338. C Matthews
    339. W McCusker
    340. S McDonough
    341. T Mehta
    342. J Meldrim
    343. L Meneus
    344. O Mihai
    345. A Mihalev
    346. T Mihova
    347. R Mittelman
    348. V Mlenga
    349. A Montmayeur
    350. L Mulrain
    351. A Navidi
    352. J Naylor
    353. T Negash
    354. T Nguyen
    355. N Nguyen
    356. R Nicol
    357. C Norbu
    358. N Norbu
    359. N Novod
    360. B O'Neill
    361. S Osman
    362. E Markiewicz
    363. OL Oyono
    364. C Patti
    365. P Phunkhang
    366. F Pierre
    367. M Priest
    368. S Raghuraman
    369. F Rege
    370. R Reyes
    371. C Rise
    372. P Rogov
    373. K Ross
    374. E Ryan
    375. S Settipalli
    376. T Shea
    377. N Sherpa
    378. L Shi
    379. D Shih
    380. T Sparrow
    381. J Spaulding
    382. J Stalker
    383. N Stange-Thomann
    384. S Stavropoulos
    385. C Stone
    386. C Strader
    387. S Tesfaye
    388. T Thomson
    389. Y Thoulutsang
    390. D Thoulutsang
    391. K Topham
    392. I Topping
    393. T Tsamla
    394. H Vassiliev
    395. A Vo
    396. T Wangchuk
    397. T Wangdi
    398. M Weiand
    399. J Wilkinson
    400. A Wilson
    401. S Yadav
    402. G Young
    403. Q Yu
    404. L Zembek
    405. D Zhong
    406. A Zimmer
    407. Z Zwirko
    408. DB Jaffe
    409. P Alvarez
    410. W Brockman
    411. J Butler
    412. C Chin
    413. S Gnerre
    414. M Grabherr
    415. M Kleber
    416. E Mauceli
    417. I MacCallum
    418. Drosophila 12 Genomes Consortium
    (2007)
    Nature 450:203–218.
    https://doi.org/10.1038/nature06341
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
    Evidence for horizontal transmission of the P transposable element between Drosophila species
    1. SB Daniels
    2. KR Peterson
    3. LD Strausbaugh
    4. MG Kidwell
    5. A Chovnick
    (1990)
    Genetics 124:339–355.
  39. 39
  40. 40
    Position effect variegation in Drosophila Melanogaster: relationship between suppression effect and the amount of Y chromosome
    1. P Dimitri
    2. C Pisano
    (1989)
    Genetics 122:793–800.
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
    Chapter 1 cellular mechanism for targeting heterochromatin formation in Drosophila
    1. JC Eissenberg
    2. G Reuter
    (2009)
    In: K. wangW Jeon, editors. International Review of Cell and Molecular Biology. Academic Press. pp. 1–47.
    https://doi.org/10.1016/s1937-6448(08)01801-7
  48. 48
  49. 49
  50. 50
  51. 51
    The evolutionary advantage of recombination
    1. J Felsenstein
    (1974)
    Genetics 78:737–756.
  52. 52
  53. 53
    Transposable elements
    1. DJ Finnegan
    (1992)
    Current Opinion in Genetics & Development 2:861–867.
    https://doi.org/10.1016/S0959-437X(05)80108-X
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
    Chromosome Constitution and Behavior in Eversporting and Mottling in Drosophila Melanogaster
    1. JW Gowen
    2. EH Gay
    (1934)
    Genetics 19:189–208.
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
    The effect of modifiers of position-effect variegation on the variegation of heterochromatic genes of Drosophila Melanogaster
    1. MG Hearn
    2. A Hedrick
    3. TA Grigliatti
    4. BT Wakimoto
    (1991)
    Genetics 128:785–797.
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
    Transposable elements in mendelian populations. I. A theory
    1. CH Langley
    2. JF Brookfield
    3. N Kaplan
    (1983)
    Genetics 104:457–471.
  89. 89
  90. 90
  91. 91
  92. 92
    Transposable elements in natural populations of Drosophila Melanogaster
    1. YC Lee
    2. CH Langley
    (2010)
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 365:1219–1228.
    https://doi.org/10.1098/rstb.2009.0318
  93. 93
  94. 94
  95. 95
  96. 96
    Dosage-dependent modifiers of position effect variegation in Drosophila and a mass action model that explains their effect
    1. J Locke
    2. MA Kotarski
    3. KD Tartof
    (1988)
    Genetics 120:181–198.
  97. 97
  98. 98
  99. 99
  100. 100
  101. 101
  102. 102
  103. 103
  104. 104
  105. 105
  106. 106
  107. 107
  108. 108
  109. 109
  110. 110
    Chromosome rearrangement by ectopic recombination in Drosophila Melanogaster: genome structure and evolution
    1. EA Montgomery
    2. SM Huang
    3. CH Langley
    4. BH Judd
    (1991)
    Genetics 129:1085–1098.
  111. 111
  112. 112
  113. 113
  114. 114
  115. 115
  116. 116
  117. 117
  118. 118
  119. 119
  120. 120
  121. 121
  122. 122
  123. 123
  124. 124
  125. 125
  126. 126
  127. 127
  128. 128
  129. 129
  130. 130
  131. 131
  132. 132
  133. 133
  134. 134
  135. 135
  136. 136
  137. 137
  138. 138
  139. 139
  140. 140
  141. 141
  142. 142
  143. 143
  144. 144
  145. 145
  146. 146
  147. 147
    The effects of chromosome rearrangements on the expression of heterochromatic genes in chromosome 2L of Drosophila Melanogaster
    1. BT Wakimoto
    2. MG Hearn
    (1990)
    Genetics 125:141–154.
  148. 148
  149. 149
  150. 150
  151. 151
  152. 152
  153. 153

Decision letter

  1. Magnus Nordborg
    Reviewing Editor; Vienna Biocenter, Austria

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Pervasive epigenetic effects of Drosophila euchromatic transposable elements impact their evolution" for consideration by eLife. Your article has been favorably evaluated by Detlef Weigel (Senior Editor) and three reviewers, one of whom, Magnus Nordborg (Reviewer #1), is a member of our Board of Reviewing Editors. The following individual involved in review of your submission has agreed to reveal their identity: Brandon Gaut (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This is a very thorough and important demonstration of the idea that epigenetic silencing of TEs in Drosophila may affect nearby genes, and lead to selection against TE insertions. This agrees with similar observations in plants, and suggests that this is a universal aspect of TE biology (at least in organisms with compact genomes).

Essential revisions:

All three reviewers are positive, but have produced lengthy, detailed, and non-overlapping lists of comments. Having gone through these comments, we think they are generally reasonable, self-explanatory and constructive, and we will therefore not follow standard eLife practice and try to extract the most important ones. You should obviously feel free to disagree with particular points, but in that case you will need to convince us that you are right!

Reviewer #1:

This is a really good paper that will likely prove to be major contribution to our understanding of TE dynamics. I have only a few concerns.

First, the paper is not up on the recent work in plants. The list quite long, but I would suggest starting with the following (and working backward from the refs):

Quadrana, Leandro, Amanda Bortolini Silveira, George F. Mayhew, Chantal LeBlanc, Robert A. Martienssen, Jeffrey A. Jeddeloh, and Vincent Colot. 2016. "The Arabidopsis thaliana Mobilome and Its Impact at the Species Level." eLife 5 (June). doi:10.7554/eLife.15716. [epigenetics silencing of TEs and selection on TEs in euchromatin]

Stuart, Tim, Steven Eichten, Jonathan Cahn, Yuliya Karpievitch, Justin Borevitz, and Ryan Lister. 2016. "Population Scale Mapping of Transposable Element Diversity Reveals Links to Gene Regulation and Epigenomic Variation." eLife 5 (December). doi:10.7554/eLife.20777. [epigentic silencing]

Kawakatsu, Taiji, Shao-Shan Carol Huang, Florian Jupe, Eriko Sasaki, Robert J. Schmitz, Mark A. Urich, Rosa Castanon, et al. 2016. "Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions." Cell 166 (2). Elsevier: 492-505. [genomewide epigenetics]

Note that the above look at DNA methylation, but this is essentially a proxy for H3K9.

Second, I'm somewhat concerned by using "present in Zambia" as a proxy for allele frequency. The logic is OK, but it is a bit crude, and there is a real chance that TE's outside Africa could have been affected by the environment. Also (subsection “TEs with epigenetic effects are more strongly selected against”, second paragraph), you do seem to have some real frequency data. Where is the figure?

Third, the last paragraph of the subsection “TEs with epigenetic effects are more strongly selected against”, about age and frequency is, frankly, very confused. A causal (statistical) relationship between age and frequency of a mutation is expected under any population genetics model except strong balancing selection. You observe an inverse correlation between silencing and frequency, and attribute this to purifying selection. I believe you are right, but it is formally possible that silencing simply decays with time. The right (and only) way to test this is to get an estimate of age that is independent of allele frequency. You could look for accumulation of mutations between copies of the same insertion, but you would never have enough power (I think – time is too short). Better is to look for the extent of haplotype sharing (the extent of linkage disequilibrium) surrounding insertion alleles. This decays with time, and alleles under selection should be surrounded by more extensive LD than neutral ones, conditional on allele frequency. This would be nice independent demonstration of selection on the more silenced ones.

The age of the family is not relevant to this (the time scales are completely different).

Reviewer #2:

In this remarkably thorough paper, Lee and Karpen examine the epigenetic effects of TE insertions on euchromatic regions of Drosophila genomes. The basic premise is that TEs elevate the incidence and spread of repressive epigenetic marks, causing dampening effects on the expression of nearby genes. Depending on the gene in question, these dampening effects could lead to deleterious selection on the TE-containing haplotypes. As such, epigenetic modifications of TEs may be a primary cause of deleterious selection against them. These ideas have been bandied about for a few years now, but there have been few thorough analyses of the idea.

To test these ideas, the authors start by comparing two inbred Drosophila strains for their TE content and the association of individual TE insertions (and euchromatic regions, for that matter) with enrichment of a repressive mark (H3K9me3). They then detail a number of analyses showing that TE insertions are associated with enrichment, that this enrichment often spreads to the TE's flanking regions, which this local enrichment can affect flanking genic regions, and that genic enrichment correlates negatively with gene expression. They also show that enrichment tends to be higher for TEs found at low frequency, which is a direct prediction of the 'deleterious selection by epigenetic modification' hypothesis. Finally, they extend comparisons to D. simulans. They present evidence that the epigenetic response is stronger in D. simulans, perhaps (?) due to modification of the titration of heterochromatic proteins.

This study is a welcome and important contribution to the field. It is novel in many aspects but particularly in its depth. The comparison between inbreds permit the address of existing ideas on a genome-wide scale but also permit comparisons among TE families, etc. In fact, I like this paper so much that I have no major comments about the content or any need for additional analyses. That said, it is a lengthy paper and, as might be expected of such a detailed work, there are many areas where clarity could be improved.

Specific comments:

Abstract: The abstract needs to be refined, as, in its current shape, it lacks in detail (e.g., H3K9me3).

Introduction: As a non-drosophilist I found myself puzzled by some of the statements about piRNAs and siRNAs that come later in the manuscript. My comprehension might have been helped by a paragraph or two that reviews known mechanisms of TE epigenetic modification in Drosophila. Would it be worth adding this information? [I was also puzzled about the mechanism of spread. If there is prior knowledge about this, it might be useful to add the information to a paragraph or two on mechanisms.]

Subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, fourth paragraph – because measures like the "extent of spread", "% increase in H3K9me2", "with/without epigenetic effect" etc. are used throughout, I think it is well worth giving their explicit definitions in the Results in addition to the Materials and methods, especially since the definitions are pretty short. This isn't necessary but may aid clarity for readers.

Subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”, third paragraph – I had difficulties synthesizing the statement that "there is no difference in the amount of piRNAs targeting different types of TE families", with the fact that piRNA amounts across families correlate with things like the "mean extent of spread" in Table 1. I guess this is saying that you can still rank families by piRNA amount and get some neat correlations, but it seems a bit strange.

Anyway, if nothing else, the description of Table 1 needs to be improved. From the text, I eventually realized these were Spearman correlations, but the table is not well described in its heading or in its first citation in the text. Finally, I believe this section (subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”) could benefit from some clarity about a couple of things: 1) how do piRNAs and siRNAs differ? 2) mapping piRNAs/siRNAs to TEs can be tricky due to multiple mapping phenomena and can totally mislead an analysis if not done carefully. Maybe I missed it in the Materials and methods, but it wasn't clear to me how the data in Table 1 were gathered (i.e., per family counts of piRNAs and siRNAs). Were pi/siRNAs counted once? Weighted by number of mapping positions? I assume they were mapped to the genome (as opposed to a reduced representation TE dataset). The variation in how these things are done is pretty huge; I just ask that it be clear. (Again, didn't see it in Materials and methods at all, but I think the Materials and methods implies that counts were gathered from another study?)

Subsection “TEs with epigenetic effects are more strongly selected against”, last paragraph – I found most of the discussion on age to be a bit muddled. If I'm not mistaken, frequency and age should covary, as argued, and (all else being equal) both should covary with epigenetic effect. Thus, the findings in the aforementioned paragraph seem to me to be unlikely, unless age is misestimated. My guess is that the sequence similarity measure to the reference is a misleading or noisy. (Depending on the history in a particular TE family, it's easy to envision cases where it might be positively misleading.) Not sure what to advise here, but I suspect it needs some more thought? Or could even be removed?

Subsection “Epigenetic effects of TEs result in differential epigenetic states of adjacent coding genes”, second paragraph – I found myself a bit confused here. Is the z-score compared between genes that differ in the presence/absence of flanking TEs in the two lines? If so, what window size defines "flanking"? Or is it really just the closest flanking TE without reference to distance? I'd just like a few more methodological details here in the flow of description of the results.

Table 2: Despite my best attempts, I could not understand it at all! Can you provide an example in the footnotes (for example, explicitly define the data in the first row) or in the text? Maybe I'm missing something obvious, but I just didn't get it!

Discussion, third paragraph: again, Table 2 is referred to almost obliquely. It'd be nice to be described in more detail or add a sentence that references the table explicitly – i.e., "For example, for TEs with higher enrichment in RAL315…". Great points about inbreds (Discussion, fourth paragraph) and synergistic epistasis (Discussion, fifth paragraph). Really enjoyed them!

Subsection “Chip-seq data analysis”, second paragraph and following. These are key definitions that (maybe) should be in the Results but definitely should be defined more clearly and precisely. I got the gist, but more precision and care are merited.

Reviewer #3:

In this paper, the authors analyze the effects that polymorphic TE insertions have on chromatin states in Drosophila. This paper is an extension of a previous study that was limited by the fact that the effect of polymorphic TE insertions could not be determined since only one reference was used (from ModEncode, for example). In this paper, the authors explicitly examine the effects of TE insertion polymorphism on variation in repressed chromatin signatures within the genome. This is an important contribution to the field and is very well written.

My major concern, which I will point out specifically in several places, is that the representation of the data limits the reader’s ability to understand the variation in the signal. In particular, many of the figures compress the data into box plots, where this compression is not appropriate. I will alert the authors to this paper:

"Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm". While the authors might think this concern is nitpicky, I don't think so. The authors are aiming for a high-profile paper and the readers should be able to see the structure of the underlying data and variation. Not all figures are at issue, but some can be revised.

The article is lacking reference to the work of the Kalmykova lab. In particular, the article must cite Shpiz et al. Euchromatic transposon insertions trigger […] This paper is an important contribution to the field.

Subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, second paragraph: It would be worth illuminating in the text the interesting observation in Figure 1 that embryos show these effects most strongly.

Subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, fourth paragraph: It is a little awkward to say "homologous sequences that lack nearby TEs." I think the authors mean "with corresponding alleles lacking the TE insertion."

Subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, fourth paragraph: Since the metrics "extent of spread" and "% increase in H3K9me" are used throughout, please exactly define these metrics here, not just in the Methods.

Subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, fourth paragraph: Please exactly define the characteristic: "associated with enrichment for H3K9me2 in at least 1 kb". Likewise, please define how the extent of spreading is defined (what is the cutoff for spread).

Figure 2 legend. Please clarify in the legend that these are TE insertion alleles that are absent in the other strain.

Subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”: There is a suggestion that LTRs have stronger effects. This is demonstrated in several ways, but I don't follow the statistics:

1) Subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”, second paragraph: There is suggestion of enrichment with several statements such as "eight of 11 TE families with over half of insertions […] are LTR-type" Is this enrichment significant, given there are more LTR families? A Fisher's exact would be suitable here. A similar point can be made about the next sentence.

2)They perform a statistical test of this phenomenon (subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”, second paragraph), but it is not clear the test is appropriate. Here, they seem to be binning all LTRs (I could be wrong) but this Mann-Whitney U result may be solely driven by one abundant LTR family, such as copia or roo. The authors should provide a more convincing case that LTRs really exert a stronger effect, in general, instead of this being driven by some families. This could be done in an ANOVA framework, where the effect of TE class is explicitly tested, in contrast to family level effects.

Subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”, third paragraph: "It is worth noting […]" I don't understand this sentence. It appears that they are saying that there are no differences, between strains, in piRNA abundance for each given family. Is that correct? Then why are P-values indicated for two genotypes, sense such a test between two strains would have one P value?

Figure 5 is an example of data compression with bar charts that is not appropriate. First, frequency should not be binned in this manner. Such binning is arbitrary and masks the underlying data. Please plot these values continuously. In addition, for 5B and 5C, plot the Y axis continuously for each TE insertion.

Also, in regard to the analysis of population frequencies, this analysis seems completely confounded by the previously reported family level effects and copia number effects. For example, copia elements (LTRs, with large copia number) may also have low frequencies for other reasons that have nothing to do with the effect they exert on local chromatin. I imagine there are two ways to deal with this. 1) All contrasts must control for within family effects. For example, the test should be: Do copia elements with a stronger chromatin effect have lower frequencies than other copia elements? Do roo elements with a stronger chromatin effect have lower frequencies than other roo elements? The family level effects must be accounted for. 2) A formal way to do this would be to directly model population frequency and perform a multiple logistic regression where family effects and chromatin effects are evaluated for their capacity to predict frequencies in the population. This way, family effects and chromatin effects can be explicitly measured and evaluated.

Right now, this binning approach is not appropriate.

Subsection “Epigenetic effects of TEs result in differential epigenetic states of adjacent coding genes”. Similar data compression by binning for the expression analysis is not appropriate. The Fisher's exact test approach in Table 2 doesn't seem appropriate. There is a very nice paired structure to these data that can simply be measured as the difference in RPKM between alleles with a TE and alleles without a TE. Then, a scatter plot can be shown to demonstrate the degree to which chromatin effects are predictive of gene expression differences. Here, the X-axis would be some continuous measure of chromatin effect and the Y-axis would be the measured effect on gene expression. Such a scatter plot is essential for the reader to evaluate the nature of these data. These bins do not allow the reader to understand what is going on.

Figure 7. It seems a binomial test would show 7B (proportion of TE spread) significant, but not for the rest of 7B. Please provide P values for a test of more dots above the diagonal than below the diagonal in 7B.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Pervasive epigenetic effects of Drosophila euchromatic transposable elements impact their evolution" for further consideration at eLife. Your article has been favorably evaluated by Detlef Weigel (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

As requested by Reviewer #1, please integrate other previous findings better.

As requested by Reviewer #3, please address points 1, 3, and 4 (or explain why this is not a good idea).

The paper is excellent: we just think it would be further improved with relatively little effort.

Reviewer #1:

I'm happy with your response to my comments with one exception: just adding a bunch of plant references at one point in the Introduction is not enough. Sure, the authors get the citation count credit they deserve, but in the name of scholarship you should do better.

What struck me when I read your paper is how closely it parallels findings in plants, and I think your paper would improve if you made this clear.

For example, the spreading of silencing has been shown several times in plants, and there is also strong evidence that selection depends on this. Population frequencies appear to be low, and the recent Arabidopsis paper by Colot's group clearly showed that insertions are uniform, but that selection removes them so that older (more frequent) TEs cluster around the centromeres.

I know it is a pain to do this, but there are several places in both Introduction and Discussion where you could note these similarities. It would really improve the paper, and would be a service to a field that is too often is organism-limited.

Someone needs to spend a day in a coffee house with a (virtual) stack of papers and a laptop. This is not too much to ask.

With respect to estimating the age of TEs, I understand that you cannot sequence the interior of the TE, but did you try the haplotype sharing? I think it could be more powerful than you suggest. This is real work, however, and I'm fine with your dropping the section instead.

Reviewer #2:

The authors have done a thorough job responding to the reviewers' comments. I heartily endorse publication in eLife.

Reviewer #3:

This is the second version of this article that I have read. The results are very interesting and important, but I feel the argument simplifies some complexity that should be more acknowledged. The authors have addressed many of my concerns, but, as in my first review, I will ask that the authors improve the presentation to allow readers to have a more clear understanding of the underlying complexities.

1) With regard to the definition of extent of spread, I am still slightly confused. The authors do a good job and provide a graphical representation of these definitions, but by "farthest window", how do they deal with values on one side being larger than values on the other side. Suppose the extent of spread to the left is 15 kb and the extent of spread to the right is 6 kb. Is a max function used such that 15 kb is the defined extent? Please be exact and precise in this definition.

2) Introduction, last paragraph, and elsewhere, I think there should be a bit more of a hedge on the conclusion that selection is determining the patterns. Therefore, a sentence like "we observed stronger selection against TE insertions" would preferably be "we find evidence that stronger selection against TE insertions"

3) This is my most significant comment. I am still confused about the expression analysis.

First, from Table 3, it appears this effect is only observed in one strain. In fact, for TEs in RAL315 (top set of genes in Table 2), the odds ratio is 0.95, the opposite expected. Therefore, it is not clear to me that one can argue that this result is straightforward. The authors should be upfront about this strain effect. It's interesting! But it certainly makes the story more complex and reduces the strength of their broader conclusion.

Second, Table 3 can still be improved. For one, the Fisher's Exact test is being performed against the background genes (with respect to sign of effect of enrichment and expression between the two strains), but the way it is presented, the nature of values in the 2X2 table is cryptic. From first look, one might think that the four values in each row are members of the 2X2 table. Now, this is not the correct interpretation, but I encourage the authors to make this clearer. In particular, why not just provide the four different 2X2 tables? This is also made confusing from the fact that, it appears, the numbers are wrong in the example below the table. By my count, for the RAL315 TEs, 56 is the value in the focal cell (H3K9me2 higher in 315, Higher expression in 360), but the values of the other cells (based on the text below) should sum to 121, correct? But, by my count, the rest of the genes with the other states add to 177 (75+56+46). I am sorry if I am wrong, but I guess I am still confused. I do think that presentation of the actual 2X2 tables that go into the FET would be best.

In light of these issues, I am still not convinced that Fisher's exacts tests of this kind are the best way to present the data. In my last review, I had requested a figure more like Figure 7 and I appreciate the authors providing it. However, it is clear from this new figure that the effect that these TEs have on expression, via H3K9me2, is complicated. For example, from the scatter plots for RAL315 TE insertions, there are 131 genes with TE inserts that have higher H3K9me2 in RAL315. However, among these genes, most (75) show a higher level of expression in 315 – the opposite pattern expected. It is also appears from these scatterplots that there is perhaps no correlation between fold H3K9me2 enrichment differences and differences in expression. This is a very interesting and extremely important complexity that must be made evident to the reader. Therefore, I request that these scatter plots be provided in the main figures with a revised version of Table 3 and a correlation coefficient be provided for these scatter plots.

4) The authors have done a good job controlling for the non-independent effects of family and spread of H3K9me in predicting TE insertion frequency. This is evident in their Methods (subsection “TE population frequency analysis”). However, Table 2 doesn't provide the jointly estimated regression coefficient for family. It just provides regression coefficients and p-values for extent and magnitude of spread. Please provide results from the full models, where regression coefficients were jointly estimated for epigenetic and family effects.

https://doi.org/10.7554/eLife.25762.038

Author response

Reviewer #1:

This is a really good paper that is likely to prove to be major contribution to our understanding of TE dynamics. I have only a few concerns.

First, the paper is not up on the recent work in plants. The list quite long, but I would suggest starting with the following (and working backward from the refs):

Quadrana, Leandro, Amanda Bortolini Silveira, George F. Mayhew, Chantal LeBlanc, Robert A. Martienssen, Jeffrey A. Jeddeloh, and Vincent Colot. 2016. "The Arabidopsis thaliana Mobilome and Its Impact at the Species Level." eLife 5 (June). doi:10.7554/eLife.15716. [epigenetics silencing of TEs and selection on TEs in euchromatin]

Stuart, Tim, Steven Eichten, Jonathan Cahn, Yuliya Karpievitch, Justin Borevitz, and Ryan Lister. 2016. "Population Scale Mapping of Transposable Element Diversity Reveals Links to Gene Regulation and Epigenomic Variation." eLife 5 (December). doi:10.7554/eLife.20777. [epigentic silencing]

Kawakatsu, Taiji, Shao-Shan Carol Huang, Florian Jupe, Eriko Sasaki, Robert J. Schmitz, Mark A. Urich, Rosa Castanon, et al. 2016. "Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions." Cell 166 (2). Elsevier: 492-505. [genomewide epigenetics]

Note that the above look at DNA methylation, but this is essentially a proxy for H3K9.

We added references suggested by the reviewer along with other plant references. These changes can be in the third paragraph of the Introduction.

Second, I'm somewhat concerned by using "present in Zambia" as a proxy for allele frequency. The logic is OK, but it is a bit crude, and there is a real chance that TE's outside Africa could have been affected by the environment. Also (subsection “TEs with epigenetic effects are more strongly selected against”, second paragraph), you do seem to have some real frequency data. Where is the figure?

We did estimate the frequency of individual TEs in the Zambian population. Consistent with previous findings that most of the TEs are rare in Drosophila, the majority (68.5%) of the TEs were not detected in the Zambian population. We thus used either (1) TE’s population frequencies (Spearman rank correlation tests, pointed out by reviewer) or (2) TE insertion present in Zambian population or not (Mann-Whitney U test), to investigate the associations between TE’s epigenetic effects and population frequencies. We further revised our text to clarify this point (subsection “TEs with epigenetic effects are more strongly selected against”, second paragraph). However, we agree with reviewers 1 and 3 that it is necessary to show the full-range of variation in TE’s population frequencies, so we added additional supplementary figures (Figure 5—figure supplements 1 and 3).

The two strains used in the study are from DGRP, a collection of D. melanogaster strains from North America. We chose the Zambian population to estimate population frequencies of TEs, instead of the North American population, for the following reasons. One is regarding the demographic history of this population, which we discuss in the main text (subsection “TEs with epigenetic effects are more strongly selected against”, second paragraph). Others are technical reasons. The sequencing of DGRP genomes used a mixture of 454 and pair-end Illumina of several read lengths (36bp-125bp). The sequencing coverage is also highly variable between genomes. The power to detect TE insertions is thus highly variable between strains in the DGRP panel. In addition, there are extensive identity-by-descent (IBD) regions among these genomes (Cridland et al. 2013 MBE), which would further complicate the analysis. In contrast, the Zambian population was sequenced with the same sequencing technology, similar read-length and read depth, while having limited IBD regions (Lack et al. 2015 Genetics). It is worth noting that the two DGRP strains used in the study were sequenced with pair-end Illumina long read lengths (125bp) and with high sequencing coverage. There is also limited IBD identified between these two strains.

Third, the last paragraph of the subsection “TEs with epigenetic effects are more strongly selected against”, about age and frequency is, frankly, very confused. A causal (statistical) relationship between age and frequency of a mutation is expected under any population genetics model except strong balancing selection. You observe an inverse correlation between silencing and frequency, and attribute this to purifying selection. I believe you are right, but it is formally possible that silencing simply decays with time. The right (and only) way to test this is to get an estimate of age that is independent of allele frequency. You could look for accumulation of mutations between copies of the same insertion, but you would never have enough power (I think – time is too short). Better is to look for the extent of haplotype sharing (the extent of linkage disequilibrium) surrounding insertion alleles. This decays with time, and alleles under selection should be surrounded by more extensive LD than neutral ones, conditional on allele frequency. This would be nice independent demonstration of selection on the more silenced ones.

The age of the family is not relevant to this (the time scales are completely different).

We agreed with reviewers 1 and 2 that our original attempt to exclude age of TE insertions as a contributor to variation in TE’s population frequencies is not well discussed and is confusing.

We really like the analysis the reviewer suggested. However, because these genomes were sequenced with Illumina short reads, we are unable to gather the internal sequences of individual TE insertions and would detect very few mutations, if any, for inference (on top of the issue of short time scale). Given that most TEs have very low population frequencies or appear as singletons (which are of particular interests) in our analysis, using haplotype sharing to infer age of TE insertions is expected to have limited statistical power. We anticipate future long-read sequencing will enable us to assemble the internal sequence of individual TE insertions, which will allow using sequence comparisons (e.g. divergence between two LTRs of a LTR insertion, or more internal sequences) to infer TE age. Because we are unable to provide unequivocal data with respect to this issue, we removed this section of the Discussion.

Reviewer #3: […] My major concern, which I will point out specifically in several places, is that the representation of the data limits the readers ability to understand the variation in the signal. In particular, many of the figures compress the data into box plots, where this compression is not appropriate. I will alert the authors to this paper:

"Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm". While the authors might think this concern is nitpicky, I don't think so. The authors are aiming for a high-profile paper and the readers should be able to see the structure of the underlying data and variation. Not all figures are at issue, but some can be revised.

We thank the reviewer for pointing out this thoughtful paper. To ensure concise presentation of the data, we retained several boxplots and a table in the main text, while added additional supplementary figures that allow readers to evaluate the full variation of the data. Please also see our response to comments on specific figures below.

The article is lacking reference to the work of the Kalmykova lab. In particular, the article must cite Shpiz et al. Euchromatic transposon insertions trigger […]. This paper is an important contribution to the field.

We agree with the reviewer and added discussion of this reference to the Discussion section (third paragraph).

Subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, second paragraph: It would be worth illuminating in the text the interesting observation in Figure 1 that embryos show these effects most strongly.

We added discussions about this observation as suggested (subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, first paragraph).

Subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, fourth paragraph: It is a little awkward to say "homologous sequences that lack nearby TEs." I think the authors mean "with corresponding alleles lacking the TE insertion."

We revised our text to improve clarity (subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, third paragraph).

Subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, fourth paragraph: Since the metrics "extent of spread" and "% increase in H3K9me" are used throughout, please exactly define these metrics here, not just in the Methods.

Subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, fourth paragraph: Please exactly define the characteristic: "associated with enrichment for H3K9me2 in at least 1 kb". Likewise, please define how the extent of spreading is defined (what is the cutoff for spread).

We agree with reviewer 3, and now include a figure to more clearly define all three metrics (“associated with enrichment for H3K9me2 in at least 1 kb”, "extent of spread" and "% increase in H3K9me") (subsection “Euchromatic TEs exhibit extensive epigenetic effects on adjacent sequences”, third paragraph), in addition to added text in the Results and revised Materials and methods (subsection “ChIP-Seq data analysis”, second paragraph).

Figure 2 legend. Please clarify in the legend that these are TE insertion alleles that are absent in the other strain.

We revised the figure legend accordingly.

Subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”: There is a suggestion that LTRs have stronger effects. This is demonstrated in several ways, but I don't follow the statistics:

1) Subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”, second paragraph: There is suggestion of enrichment with several statements such as "eight of 11 TE families with over half of insertions…are LTR-type" Is this enrichment significant, given there are more LTR families? A Fisher's exact would be suitable here. A similar point can be made about the next sentence.

These descriptions were intended as a general summary of the TE family data, and not as direct, quantitative support for the conclusion that LTR-type TE families have stronger epigenetic effects. We refrain from doing the Fisher’s Exact test because the categorization is fairly arbitrary (e.g. TE families with >5kb average spread of H3K9me2). Accordingly, we revised the text to avoid misleading readers at this stage of the analysis (subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”, second paragraph), and only make a more definitive conclusion after quantitative analysis using Mann-Whitney U tests(see response to next comment).

2) They perform a statistical test of this phenomenon (subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”, second paragraph), but it is not clear the test is appropriate. Here, they seem to be binning all LTRs (I could be wrong) but this Mann-Whitney U result may be solely driven by one abundant LTR family, such as copia or roo. The authors should provide a more convincing case that LTRs really exert a stronger effect, in general, instead of this being driven by some families. This could be done in an ANOVA framework, where the effect of TE class is explicitly tested, in contrast to family level effects.

The Mann-Whitney U tests were comparing “the proportion of TEs with epigenetic effects”, “the average extent of H3K9me2 spread”, and “average% increase in H3K9me2 enrichment” between LTR-type TE families and other types of TE families. In other words, one TE family is one data point. So, roo and copia, despite being abundant, are two data points in these analyses and are unlikely to be the only TE families driving the significant patterns. We agree that the “units” used for these comparisons was unclear and thus revised the text (subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”, second paragraph).

Subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”, third paragraph: "It is worth noting […]" I don't understand this sentence. It appears that they are saying that there are no differences, between strains, in piRNA abundance for each given family. Is that correct? Then why are P-values indicated for two genotypes, sense such a test between two strains would have one P value?

What we meant is “irrespective of the genotype (wK or w1118), piRNA abundance is not significantly different between LTR-type and other type of TE families”. Instead of abbreviating the two p-values as “p < 0.19”, we now included p-values for each genotype to avoid this confusion. We also revised the text accordingly (subsection “TE families of LTR-type and targeted by piRNAs show stronger epigenetic effects”, third paragraph).

Figure 5 is an example of data compression with bar charts that is not appropriate. First, frequency should not be binned in this manner. Such binning is arbitrary and masks the underlying data. Please plot these values continuously. In addition, for 5B and 5C, plot the Y axis continuously for each TE insertion.

We agree with reviewers 1 and 3 that the binning may not seem appropriate without knowledge of the variation of the data. Consistent with previous findings that most of the TEs are rare in Drosophila, the majority of the TEs (68.5%) were not detected in the Zambian population, and most of other TE insertions have low population frequencies. Accordingly, we performed two types of analyses – one by categorizing TEs into ‘observed and not observed’ in the Zambian population, and the other using the continuous distribution of TE frequencies. We have revised the text to clarify this point (subsection “TEs with epigenetic effects are more strongly selected against”, second paragraph). As recommended by the reviewer, we added new supplementary figures (Figure 5—figure supplements 1, 2, 3) that show the full distribution of TE frequencies. Please also see our response to the second comment of reviewer 1.

Also, in regard to the analysis of population frequencies, this analysis seems completely confounded by the previously reported family level effects and copia number effects. For example, copia elements (LTRs, with large copia number) may also have low frequencies for other reasons that have nothing to do with the effect they exert on local chromatin. I imagine there are two ways to deal with this. 1) All contrasts must control for within family effects. For example, the test should be: Do copia elements with a stronger chromatin effect have lower frequencies than other copia elements? Do roo elements with a stronger chromatin effect have lower frequencies than other roo elements? The family level effects must be accounted for. 2) A formal way to do this would be to directly model population frequency and perform a multiple logistic regression where family effects and chromatin effects are evaluated for their capacity to predict frequencies in the population. This way, family effects and chromatin effects can be explicitly measured and evaluated.

Right now, this binning approach is not appropriate.

We performed regression analysis to jointly consider the effects of TE’s epigenetic effects and family identity on their population frequencies. To ensure that our conclusion is robust with respect to the assumed regression models, we modeled the response variable (TE’s population frequencies) with several types of regression analyses.

1) Because a large proportion of TEs are not observed in the Zambian population (68.5%), we modeled TE’s population frequencies as a dichotomous variable and performed logistic regression.

2) Because the population frequencies of TEs were first observed as “counts” (the number of individuals with a particular TE insertion in the population), and the distribution of TE frequencies is skewed towards small numbers (L-shape), TE’s population frequencies can be treated as Poisson distributed. However, the variance of TE’s population frequencies is larger than the mean (overdispersion). Accordingly, we modeled TE frequencies as either “quasi-poisson” or “negative binomial”, since both are usually used for modeling overdispersed Poisson-like count variables, and performed generalized linear regression.

Because we do not have a priori knowledge about whether the variation in population frequencies between TE families is due to a fixed effect (i.e. each TE family has biological properties that make its insertions always have lower/higher population frequencies) or a random effect (variation between families), we treated the influence of TE family identity on TE’s population frequencies either as a fixed effect (generalized linear model) or a random effect (generalized mixed linear model).

For all regression models, we observed negative associations between TE’s epigenetic effects and population frequencies, after accounting for the effect of TE family identity. The regression coefficients are statistically significant for majority of the models considered, further corroborating the finding that TEs with stronger epigenetic effects are more likely to have low population frequencies. These new results are now in Table 2 (subsection “TEs with epigenetic effects are more strongly selected against”, third paragraph).

Subsection “Epigenetic effects of TEs result in differential epigenetic states of adjacent coding genes”. Similar data compression by binning for the expression analysis is not appropriate. The Fisher's exact test approach in Table 2 doesn't seem appropriate. There is a very nice paired structure to these data that can simply be measured as the difference in RPKM between alleles with a TE and alleles without a TE. Then, a scatter plot can be shown to demonstrate the degree to which chromatin effects are predictive of gene expression differences. Here, the X-axis would be some continuous measure of chromatin effect and the Y-axis would be the measured effect on gene expression. Such a scatter plot is essential for the reader to evaluate the nature of these data. These bins do not allow the reader to understand what is going on.

We now included a supplementary figure describing the full variation of the data (Figure 6—figure supplement 2). As mentioned in the Discussion, for various reasons, we observe genes that are “not following” the prediction (e.g. the allele with a TE has high H3K9me2 enrichment but also high RNA transcript level). We thus performed Fisher’s Exact test, and the proportion of genes following the expected influence of TE insertion (i.e. alleles adjacent to TE insertions have higher H3K9me2 enrichment and lower transcript levels) is significantly more than expectation calculated from genes without TEs in 10kb.

Figure 7. It seems a binomial test would show 7B (proportion of TE spread) significant, but not for the rest of 7B. Please provide P values for a test of more dots above the diagonal than below the diagonal in 7B.

To test for differences in epigenetic effects of TEs between D. melanogaster and D. simulans, we used paired Wilcoxon tests (Wilcoxon signed-rank test) to assess significance in Figure 7. This is a nonparametric test that takes into account both the sign (i.e. the number of dots above/below the diagonal in Figure 7) and the magnitude of differences. In contrast, a binomial test would miss the important information in the “magnitude of differences”. Accordingly, we still presented our original results using Wilcoxon signed-rank test.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Reviewer #1:

I'm happy with your response to my comments with one exception: just adding a bunch of plant references at one point in the Introduction is not enough. Sure, the authors get the citation count credit they deserve, but in the name of scholarship you should do better.

What struck me when I read your paper is how closely it parallels findings in plants, and I think your paper would improve if you made this clear.

For example, the spreading of silencing has been shown several times in plants, and there is also strong evidence that selection depends on this. Population frequencies appear to be low, and the recent Arabidopsis paper by Colot's group clearly showed that insertions are uniform, but that selection removes them so that older (more frequent) TEs cluster around the centromeres.

We agree that clearer acknowledgement of the relevant plant literature is appropriate, and have revised the text accordingly. However, we are confused about the Quadrana et al. 2016 (study from Colot’s group); we were unable to find data that demonstrated selection against spreading of methylation from TEs. They found that 10% of TE insertion sites have spreading of methylation in most accessions, but this includes accessions without TE insertions. These tend to be in pericentromeric regions, thus the enrichment of DNA methylation likely results from processes other than TE-induced spreading of this epigenetic mark. If this is not what the reviewer was referring to, we would appreciate the reviewer pointing out the relevant data presented in Quadrana et al. 2016. In addition, the accumulation of TEs in pericentromeric regions could also result from lower probability of ectopic exchange and/or reduced efficacy of selection (Hill-Robertson effects). Nevertheless, in this revised manuscript we still incorporate other important observations in this study (see below).

It is worth noting that, to avoid potential confounding influence of pericentromeric and peritelomeric regions already enriched for repressive epigenetic marks (likely the situation for Arabidopsis studies), we used highly conservative euchromatin-heterochromatin boundaries and only analyzed TEs and sequences that are in euchromatic regions.

Interestingly, similar to our observations, the two recent studies in Arabidopsis that jointly analyzed mobilomes and transcriptomes (Quadrana et al. 2016 and Stuart et al. 2016) also found complex influences of TEs on differential gene expression (also see below). However, neither study further associated these data with epigenomic data. Accordingly, we were unable to further contrast our observations with the Arabidopsis literature.

I know it is a pain to do this, but there are several places in both Introduction and Discussion where you could note these similarities. It would really improve the paper, and would be a service to a field that is too often is organism-limited.

Someone needs to spend a day in a coffee house with a (virtual) stack of papers and a laptop. This is not too much to ask.

We agree that more detailed discussion of the rich plant literature would provide broader context for our study, and we revised several places in Introduction and Discussion accordingly (see list below for details). We wholeheartedly agree that the field is often too organism-centric, and hope this revised manuscript can contribute to breaking that barrier.

1) In Introduction (third paragraph) and Discussion (second paragraph), we discussed that, in Arabidopsis, TEs have been identified as a major cause for differential epigenetic states between genomes, and some of the recent studies further demonstrated that is due to spreading of repressive epigenetic marks from silenced TEs. 4

2) In Discussion (second paragraph), we compared the extent of spreading of repressive epigenetic marks from TEs between Drosophila and Arabidopsis, which is on average more extensive in the former than the latter.

3) In Discussion (fourth paragraph), we pointed that our observed complexity of TEs’ influence on the differential gene expression echoes those documented in Arabidopsis.

4) In Discussion (seventh paragraph, already discussed in earlier version of the manuscript), we discussed the surprising parallel of associations between strength of epigenetic silencing of TEs and genomic TE content between closely related species in Drosophila and Arabidopsis.

It’s too bad that we don’t have coffee houses as lovely as those in Vienna, but hope we still did a proper job on this issue!

With respect to estimating the age of TEs, I understand that you cannot sequence the interior of the TE, but did you try the haplotype sharing? I think it could be more powerful than you suggest. This is real work, however, and I'm fine with your dropping the section instead.

We agree that haplotype sharing could be a really powerful approach to estimate the age of an allele. However, most of the TEs appear as singletons in Drosophila populations and there would be limited information for inference. An alternative approach may be to use haplotype sharing between alleles with and without TEs, which some groups are current developing. This is something of continuing interest, and we intend to formally address this in the near future (maybe with advice from the reviewer!). For this study, we would like to leave it out for the moment.

Reviewer #3:

This is the second version of this article that I have read. The results are very interesting and important, but I feel the argument simplifies some complexity that should be more acknowledged. The authors have addressed many of my concerns, but, as in my first review, I will ask that the authors improve the presentation to allow readers to have a more clear understanding of the underlying complexities.

1) With regard to the definition of extent of spread, I am still slightly confused. The authors do a good job and provide a graphical representation of these definitions, but by "farthest window", how do they deal with values on one side being larger than values on the other side. Suppose the extent of spread to the left is 15 kb and the extent of spread to the right is 6 kb. Is a max function used such that 15 kb is the defined extent? Please be exact and precise in this definition.

When the farthest windows are different between the left and right sides of the TE insertion, we used the smaller value in order to be conservative. In this example, the TE would be estimated to have 6kb spread. We clarified our definition in Materials and methods (subsection “ChIP-Seq data analysis”, second paragraph).

2) Introduction, last paragraph, and elsewhere, I think there should be a bit more of a hedge on the conclusion that selection is determining the patterns. Therefore, a sentence like "we observed stronger selection against TE insertions" would preferably be "we find evidence that stronger selection against TE insertions"

We revised the text in the last paragraph of the Introduction according to reviewer’s suggestion. We also double-checked that, at other places where selection was discussed, we did state that it is our evidence suggesting stronger selection, instead of directly observed stronger selection against TE insertions.

3) This is my most significant comment. I am still confused about the expression analysis.

First, from Table 3, it appears this effect is only observed in one strain. In fact, for TEs in RAL315 (top set of genes in Table 2), the odds ratio is 0.95, the opposite expected. Therefore, it is not clear to me that one can argue that this result is straightforward. The authors should be upfront about this strain effect. It's interesting! But it certainly makes the story more complex and reduces the strength of their broader conclusion.

Second, Table 3 can still be improved. For one, the Fisher's Exact test is being performed against the background genes (with respect to sign of effect of enrichment and expression between the two strains), but the way it is presented, the nature of values in the 2X2 table is cryptic. From first look, one might think that the four values in each row are members of the 2X2 table. Now, this is not the correct interpretation, but I encourage the authors to make this clearer. In particular, why not just provide the four different 2X2 tables? This is also made confusing from the fact that, it appears, the numbers are wrong in the example below the table. By my count, for the RAL315 TEs, 56 is the value in the focal cell (H3K9me2 higher in 315, Higher expression in 360), but the values of the other cells (based on the text below) should sum to 121, correct? But, by my count, the rest of the genes with the other states add to 177 (75+56+46). I am sorry if I am wrong, but I guess I am still confused. I do think that presentation of the actual 2X2 tables that go into the FET would be best.

This is an error at our end and we have corrected it (also see below).

In light of these issues, I am still not convinced that Fisher's exacts tests of this kind are the best way to present the data. In my last review, I had requested a figure more like Figure 7 and I appreciate the authors providing it. However, it is clear from this new figure that the effect that these TEs have on expression, via H3K9me2, is complicated. For example, from the scatter plots for RAL315 TE insertions, there are 131 genes with TE inserts that have higher H3K9me2 in RAL315. However, among these genes, most (75) show a higher level of expression in 315 – the opposite pattern expected. It is also appears from these scatterplots that there is perhaps no correlation between fold H3K9me2 enrichment differences and differences in expression. This is a very interesting and extremely important complexity that must be made evident to the reader. Therefore, I request that these scatter plots be provided in the main figures with a revised version of Table 3 and a correlation coefficient be provided for these scatter plots.

We agree with the reviewer that the complexity between the associations of H3K9me2 enrichment and transcript levels is of biological significance and should be more clearly pointed out. To address reviewer’s comments, we did the following in this revision.

1) For differential expression analysis, we excluded genes that are deemed to have no or extremely low expression by the modEncode developmental time course data. These genes indeed have no or extremely few mapped reads in our dataset, and the estimated expression fold changes are expected to be more prone to errors than other genes. Details are included in Materials and methods (subsection “RNA-seq analysis”, first paragraph).

2) Figure 3—figure supplement is now Figure 7A. We replaced the confusing Table 3 with Figure 7B, which includes 2x2 tables for testing whether there is an enrichment of genes supporting the epigenetic effects of TEs on gene expression. We also pointed out genes that are not following the expected trend (Figure 7A, arrows; subsection “Epigenetic effects of TEs result in differential epigenetic states of adjacent coding genes”, last paragraph; Discussion, fourth paragraph) and the differences between two strains (subsection “Epigenetic effects of TEs result in differential epigenetic states of adjacent coding genes”, last paragraph), as pointed out by the reviewer.

3) As requested by the reviewer, we provided results for the correlation tests and (if the test is significant) the correlation coefficient between z score for differential epigenetic states and fold change for expression difference (subsection “Epigenetic effects of TEs result in differential epigenetic states of adjacent coding genes”, last paragraph). We found a negative, though weak, correlation between z score and log2 fold change for genes without TEs, but no significant correlations for genes with 10kb to TEs.

4) We provided potential explanations for the complex relationship between H3K9me2 enrichment status and differential gene expression.

A) Incidences of TE insertions leading to both down- and up-regulation of gene expression have been observed. Consistently, recent genome-wide studies in Arabidopsis found that TEs result in equal frequency of increased and decreased expression of neighboring genes (Discussion, fourth paragraph).

B) Structural variants may also contribute to differential gene expression (discussed in previous version of the manuscript).

C) This study used inbred strains, and TEs inducing strong reduction in transcript levels are expected to have been removed during the inbreeding process. The small sample size in our current study (two strains) likely precluded detecting subtle functional consequences of TE’s epigenetic effects that are expected prevalent in inbred strains (Discussion, fifth paragraph).

D) There are known exceptions to the negative influence of enrichment of heterochromatic marks on gene expression. In Drosophila, multiple essential genes are located in pericentromeric heterochromatin, and their expression depends on a heterochromatin environment (i.e. enrichment of H3K9me2/3 marks and heterochromatin proteins). Furthermore, several homologs of D. melanogaster heterochromatic genes are located in euchromatic regions in other species and, even in euchromatin, these homologs are also enriched with heterochromatin proteins/marks, suggesting conservation of epigenetic environment for gene function (Discussion, fourth paragraph).

4) The authors have done a good job controlling for the non-independent effects of family and spread of H3K9me in predicting TE insertion frequency. This is evident in their Methods (subsection “TE population frequency analysis”). However, Table 2 doesn't provide the jointly estimated regression coefficient for family. It just provides regression coefficients and p-values for extent and magnitude of spread. Please provide results from the full models, where regression coefficients were jointly estimated for epigenetic and family effects.

These data are now provided as Table 2—source data 1. Because of the large number of TE families (which generates an extensive table), we did not provide this in the main text Table 2.

https://doi.org/10.7554/eLife.25762.039

Article and author information

Author details

  1. Yuh Chwen G Lee

    1. Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, United States
    2. Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, United States
    Contribution
    YCGL, Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    grylee@lbl.gov
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-0081-7892
  2. Gary H Karpen

    1. Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, United States
    2. Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, United States
    Contribution
    GHK, Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Writing—review and editing
    For correspondence
    GHKarpen@lbl.gov
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0003-1534-0385

Funding

National Institute of General Medical Sciences (R01 GM117420)

  • Gary H Karpen

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Julie Cridland, Robert Kofler, and Nelson Lau for discussions and data for the annotations of TE insertions, Sasha Langley and other members of the Karpen lab for helpful discussions of the project, and Hsiao-Han Chang for helpful discussions regarding statistical analysis. We greatly appreciate Andrea Betancourt and Mia Levine for critically reading earlier version of the manuscript, and Magnus Nordborg, Brandon Gaut, and one anonymous reviewer for their helpful and constructive comments. This study was supported by NIH R01 GM117420 to GHK.

Reviewing Editor

  1. Magnus Nordborg, Vienna Biocenter, Austria

Publication history

  1. Received: February 6, 2017
  2. Accepted: June 9, 2017
  3. Version of Record published: July 11, 2017 (version 1)

Copyright

© 2017, Lee et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,391
    Page views
  • 225
    Downloads
  • 7
    Citations

Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)