Activation of individual L1 retrotransposon instances is restricted to cell-type dependent permissive loci

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

LINE-1 (L1) retrotransposons represent approximately one sixth of the human genome, but only the human-specific L1HS-Ta subfamily acts as an endogenous mutagen in modern humans, reshaping both somatic and germline genomes. Due to their high levels of sequence identity and the existence of many polymorphic insertions absent from the reference genome, the transcriptional activation of individual genomic L1HS-Ta copies remains poorly understood. Here we comprehensively mapped fixed and polymorphic L1HS-Ta copies in 12 commonly-used somatic cell lines, and identified transcriptional and epigenetic signatures allowing the unambiguous identification of active L1HS-Ta copies in their genomic context. Strikingly, only a very restricted subset of L1HS-Ta loci - some being polymorphic among individuals - significantly contributes to the bulk of L1 expression, and these loci are differentially regulated among distinct cell lines. Thus, our data support a local model of L1 transcriptional activation in somatic cells, governed by individual-, locus-, and cell-type-specific determinants.

https://doi.org/10.7554/eLife.13926.001

eLife digest

Retrotransposons, also known as jumping genes, have invaded the genomes of most living organisms. These fragments of DNA have the ability to move or copy themselves from one location of a chromosome to another. Depending on where they insert themselves, retrotransposons can modify the sequence of nearby genes, which can alter or even abolish their activity. Although these genetic modifications have contributed significantly to the evolution of different species, retrotransposons can also have detrimental effects; for example, by causing new cases of genetic diseases.

Adult human cells have a number of mechanisms that work to keep the activity of retrotransposons at a very low level. However, in many types of cancers retrotransposons escape these defense mechanisms and ‘jump’ actively. This is thought to contribute to the development and spread of cancerous tumors.

To understand how jumping genes are mobilized, a fundamental question must be answered: is the high jumping gene activity observed in some cell types a result of activating many copies of the retrotransposons, or only a few of them? This question has been difficult to address because there are more than one hundred copies of retrotransposons that could potentially move in humans, many of which have not even been referenced in the human genome map. Furthermore, each copy is almost identical to another one, making it difficult to discriminate between them.

Philippe et al. have now developed an approach that can map where individual retrotransposons are located in the genome of normal and cancerous cells and measure how active these jumping genes are. This revealed that only a very restricted number of them are active in any given cell type. Moreover, different subsets of jumping genes are active in different cell types, and their locations in the genome often do not overlap.

Thus, whether jumping genes are activated depends on the cell type and their position in the genome. These results are in contrast to the prevalent view that retrotransposons are activated in a more widespread manner across the genome, at least in cancerous cells. Overall, Philippe et al.’s findings pave the way towards characterizing the chromosome regions in which retrotransposons are frequently activated and understanding how they contribute to cancer and other diseases.

https://doi.org/10.7554/eLife.13926.002

Introduction

At least half of our DNA is derived from repeated and dispersed sequences called retrotransposons, a class of mobile genetic elements which proliferate via an RNA-mediated copy-and-paste mechanism termed retrotransposition (see [Burns and Boeke, 2012; Hancks and Kazazian, 2012; Richardson et al., 2015] for recent reviews). Since most copies have accumulated mutations or are truncated, they are unable to initiate new cycles of retrotransposition and could be considered as molecular fossils (although in some cases they might nevertheless still be transcriptionally active [Macia et al., 2011]). In contrast, the youngest and human-specific L1 subfamily, 'transcribed L1, subset a' or L1HS-Ta, continues to retrotranspose and to accumulate in modern human genomes (Boissinot et al., 2000). Hence, each individual has hundreds of additional copies not present in the reference genome, referred to as 'non-reference' L1HS-Ta, which contribute to our genetic diversity (Xing et al., 2009; Cordaux and Batzer, 2009; Ewing and Kazazian, 2010; Beck et al., 2010; Huang et al., 2010; Iskow et al., 2010; Kidd et al., 2010; Lupski, 2010; Ewing and Kazazian, 2011; Ray and Batzer, 2011; Stewart et al., 2011; Mir et al., 2015). Recent advances in deep-sequencing technologies have also led to the discovery that L1HS-Ta are not only able to mobilize in the germline, in the early embryo and in embryonic stem cells - resulting in inheritable genetic variations (van den Hurk et al., 2007; Wissing et al., 2012; Hancks and Kazazian, 2012; Macia et al., 2014) - but can also retrotranspose in some somatic tissues such as brain (Faulkner et al., 2009; Coufal et al., 2009; Baillie et al., 2011; Evrony et al., 2012; Erwin et al., 2014; Richardson et al., 2014; Upton et al., 2015), and in many epithelial cancers (Miki et al., 1992; Iskow et al., 2010; Solyom et al., 2012; Shukla et al., 2013; Rodić and Burns, 2013; Pitkänen et al., 2014; Helman et al., 2014; Tubio et al., 2014; Goodier, 2014; Ewing et al., 2015; Rodić et al., 2015; Doucet-O'Hare et al., 2015; Paterson et al., 2015).

The overall ability of L1 elements to retrotranspose presumably results from the balance between the activities of the L1 sequences themselves, and the effects of restricting cellular pathways. The first step required to initiate retrotransposition of a particular L1 instance is its transcriptional activation: this is primarily driven by an internal promoter located within the L1 5' UTR (Swergold, 1990; Minakami et al., 1992; Tchénio et al., 2000; Athanikar et al., 2004), but can be repressed by CpG methylation (Yoder et al., 1997; Bourc'his and Bestor, 2004; Muotri et al., 2010; Wissing et al., 2012; Castro-Diaz et al., 2014). Production of L1 RNA transcripts is essential both for the translation of L1-encoded proteins, ORF1p and ORF2p, which are required for retrotransposition (Moran et al., 1996), and to act as a template for reverse transcription itself (Wei et al., 2001). After reverse transcription and genomic integration, the sequences of each L1 element can accumulate genetic alterations (mutations, deletions, insertion of nested transposable elements), and these can alter the intrinsic integrity and biochemical activity of these copies. As a result, only a fraction of L1 elements are retrotransposition-competent, even when cloned in a plasmid and tested in cellular assays, with their expression driven by a strong constitutive promoter (Brouha et al., 2003; Beck et al., 2010). These so-called 'hot' L1 elements are highly enriched among the youngest L1HS-Ta insertions, which are polymorphic among individuals (Beck et al., 2010; Lupski, 2010; Beck et al., 2011). Finally, additional cellular pathways and restriction factors can limit L1 activities at multiple other stages of the L1 retrotransposition cycle (see [Heras et al., 2014; Richardson et al., 2015; Pizarro and Cristofari, 2016] for reviews).

Our understanding of L1 transcriptional activation, particularly in the context of different cell types, remains extremely limited. Indeed, studying this process is complicated by the extent of L1 insertional polymorphisms in individual genomes and the extreme level of sequence identity between the copies of the youngest (and most-active) L1HS-Ta subfamily and with older copies of retrotransposition-incompetent subfamilies. Theoretically, the high L1 activity observed in particular cell types could result from global unleashing of most L1HS-Ta copies. Alternatively, it could derive from a few deregulated L1HS-Ta instances. To resolve these competing models, we mapped the location of each L1HS-Ta element dispersed in the genome of a panel of normal and transformed human cells, identified a genomic signature for the transcriptionally active copies and investigated the contribution of each of them to the bulk of L1HS-Ta transcripts. We found that individual L1 instances exhibit both locus- and cell-type-specific activation, implying that L1 mutagenic activity originates from 'hot L1' inserted in permissive loci and suggesting an unforeseen new layer of cell-type specific regulation to control endogenous retrotransposons.

Results

Global expression of recent L1 elements is variable in human cells

Human L1-derived RNA-transcripts and proteins, required for L1 retrotransposition, are detected in embryonic stem cells, in embryonal carcinoma cells, and other transformed cells or tumors, as well as in neuronal progenitor cells, but not in most primary cells, such as fibroblasts (Faulkner et al., 2009; Coufal et al., 2009; Belancio et al., 2010; Wissing et al., 2012). To study the polymorphism and expression of the L1HS-Ta subfamily at the level of individual genomic instances, we selected twelve widely used cell lines belonging to each of these different categories (Supplementary file 1). These included 10 cell lines which have been characterized in depth as part of the ENCODE project (Bernstein et al., 2012), together with two others – the commonly used embryonic lung fibroblast line MRC-5, and the embryonic carcinoma cell line 2102Ep, which is known to express high levels of endogenous L1HS-Ta (Leibold et al., 1990).

As a first estimate of L1 activity, we quantified and compared the endogenous levels of the L1-encoded ORF1p protein in distinct cell-lines. We detected ORF1p expression in whole cell extracts of half of the transformed cell lines (Figure 1—figure supplement 1), consistent with the proportion of human tumors expressing ORF1p (Rodić et al., 2014) and with previous work (Belancio et al., 2010). As expected, no ORF1p was detected in primary fibroblasts. ORF1p associates with the L1 mRNA, and L1 ORF2p, to form a ribonucleoprotein particle (RNP), which mediates the retrotransposition reaction (Kulpa and Moran, 2006; Doucet et al., 2010). To ensure the highest sensitivity and to enrich for functional ORF1p (at least able to bind RNA), we prepared L1 RNPs by sucrose cushion ultracentrifugation and probed ORF1p by immunoblot. We observed similar results as in whole cell extracts, except that ORF1p was faintly detected in two additional transformed cell lines (HeLa S3, Hep G2, Figure 1a). In a complementary approach, we estimated the proportion of L1 transcripts originating from the L1HS-Ta subfamily by counting RNA-seq reads mapped on the L1HS consensus sequence, which encompass subfamily-diagnostic SNPs in the L1 3' UTR sequence (ACA for L1HS-Ta, ACG for L1HS-PreTa and GAG for L1PA2 and older) (Boissinot et al., 2000). For this analysis, publicly available data from the hESC line H1 were also included (no stranded polyA+ RNA-seq data were available for MRC-5 and HEK-293 cells). Consistent with L1 RNP quantification, MCF7 and 2102Ep cells exhibit the highest levels of L1HS-Ta RNA-seq tags. Most other transformed cells have intermediate levels, while HCT 116 and primary fibroblasts (BJ, IMR-90) have extremely reduced L1HS-Ta levels (Figure 1b). In agreement with previous studies on other hESC, H1 cells express relatively high levels of L1HS-Ta (Garcia-Perez et al., 2007; Macia et al., 2011). Altogether these data indicate that, in several - but not all - transformed cells and in hESC, L1HS-Ta retrotransposons can escape the epigenetic, transcriptional and post-transcriptional controls that usually limit their expression in most somatic cells (Faulkner et al., 2009).

Figure 1 with 1 supplement see all

Download asset Open asset

Global expression of L1HS elements in a panel of human somatic cell lines.

(a) ORF1p immunoblot analysis of L1 RNP accumulation in the indicated cell lines. Top, ORF1p immunoblot. Bottom, S6 Ribosomal Protein immunoblot as loading control. The quantity of RNP loaded is indicated at the bottom of the gel. (b) Global estimate of L1HS-Ta RNA levels obtained by counting RNA-seq reads mapping against the L1HS consensus and containing the Ta-specific ACA diagnostic signature, normalized by the total number of reads mapping in the human reference genome (hg19) (mean ± s.e.m., n=2 except for MCF-7 where n=4, and HCT 116 where n=1). This analysis is based on stranded polyA+ RNA-seq data (Supplementary file 1). None were available for MRC-5 and HEK-293 cells, but data obtained from the hESC line H1 were included. See also Figure 1—figure supplement 1.

https://doi.org/10.7554/eLife.13926.003

A comprehensive map of L1HS-Ta elements in a panel of human cells

We next asked whether the observed expression of L1HS-Ta in some cells results from the transcriptional activation of all or most L1HS-Ta genomic copies, or only of a few of them. As a first step, we mapped the genomic location of all L1HS-Ta elements in each cell line of our panel. To achieve this task, we adapted for deep sequencing an existing method termed ATLAS (Badge et al., 2003). In brief, ATLAS-seq relies on the random mechanical fragmentation of the genomic DNA to ensure high-coverage, ligation of adapter sequences, suppression PCR-amplification of L1HS-Ta element junctions, and Ion Torrent sequencing using single-end 400 bp read chemistry (Figure 2a–c and Figure 2—figure supplement 1a, see also Materials and methods section). A notable aspect of ATLAS-seq is that we combine amplification and mapping of L1 downstream flanking sequence as well as the L1 upstream flanking sequence of full length elements (Figure 2a–c). This allows the unambiguous identification of full-length and potentially retrotransposition-competent genomic instances. In total, ATLAS-seq identified 7823 high-confidence L1HS-Ta insertions in the 12 cell lines analyzed, corresponding to 1633 distinct loci and including 358 full length elements (22%) (Supplementary files 2 and 3). The human reference genome hg19 contains 485 L1HS-Ta insertions with a detectable 3' end. On average (± s.d.), each cell line contains 652 (±68) L1HS-Ta copies, including 178 (±12) full-length elements. Among them 393 (±10) are reference insertions, 179 (±18) are non-reference L1HS-Ta previously identified as L1 insertion polymorphisms and catalogued in euL1db (Mir et al., 2015), and 80 (±60) are novel insertions (Figure 2d). ATLAS-seq recovers 98% of the L1HS-Ta elements previously described as fixed in the human population (see Materials and methods and Figure 2—figure supplement 1b), showing that this mapping approach is close to being comprehensive. To further validate the L1HS-Ta elements mapped by ATLAS-seq, we randomly selected and tested by PCR 72 non-reference insertions identified in HEK-293T cells with a broad range of supporting ATLAS-seq reads. Primers could be designed for 70/72 loci. We validated 66/70 of the tested L1HS-Ta, giving a true positive rate of 94% (Supplementary file 4). One fifth of the L1 loci are present in all tested cell lines and approximately 40% of them are present in only one of the cell lines. The remaining insertions show an intermediate level of polymorphism among the studied cell lines (Figure 2—figure supplement 1c). Finally, each pair of distinct diploid normal fibroblast lines (IMR-90, BJ and MRC-5 in our panel) differs at an average of 298 positions with regards to the presence or absence of a specific L1HS-Ta copy, in remarkable agreement with previous estimates (Figure 2—figure supplement 1d) (Ewing and Kazazian, 2010). Collectively, these data reinforce the notion that L1HS-Ta elements are highly polymorphic and contribute to the diversity of the human genome.

Figure 2 with 1 supplement see all

Download asset Open asset

The genetic landscape of L1HS-Ta insertional polymorphisms in 12 human somatic cell lines.

(a) Principle of the ATLAS-seq procedure. The subsequent in silico steps are described in Figure 2—figure supplement 1a. (**b–c**), Modified IGV genome browser views (Thorvaldsdóttir et al., 2013) of two non-reference polymorphic L1 instances detected in MCF7 cells (b, full length L1, note the two adjacent 5'- and 3'-ATLAS-seq peaks; c, truncated L1). (d) L1HS-Ta insertions found in the various cells of the studied panel. See also Figure 2—figure supplement 1.

https://doi.org/10.7554/eLife.13926.005

Identification of a molecular signature of transcribed L1HS-Ta copies in MCF7 cells

Full-length L1 elements contain internal sense and antisense promoters located in their 5'-untranslated region (UTR). To gain insight into the pattern of L1 expression in human somatic cells at single copy resolution, we noted that the weak L1 polyadenylation signal in the 3' UTR allows a fraction of L1 transcripts to extend into the downstream genomic sequence (Holmes et al., 1994). This property enables us to use these 3’ downstream transcripts as a measure of L1 transcriptional activity, thereby circumventing the difficulties associated with attempting to unambiguously determine the origin of transcript sequences derived from within the highly-identical L1 elements (Figure 3a and Figure 3—figure supplement 1). Similarly, the L1 antisense promoter activity generates antisense transcripts extending into the upstream genomic region flanking the L1 element (Speek, 2001; Cruickshanks and Tufarelli, 2009; Rangwala et al., 2009; Macia et al., 2011; Denli et al., 2015). To identify such transcripts, we performed paired-end (2x150 bp) and stranded poly(A)+ RNA sequencing (RNA-seq, Supplementary file 2) from the highly L1-expressing breast cancer cell line MCF7 (Figure 1). Then we used sense RNA-seq tags downstream of the L1 elements mapped by ATLAS-seq, or antisense RNA-seq tags upstream of them, as a proxy to monitor the sense and antisense promoter activities of each individual copy, respectively (Figure 3a).

Figure 3 with 2 supplements see all

Download asset Open asset

Detection of transcriptionally active L1HS-Ta elements at individual copy resolution in MCF7 cells.

(a) Theoretical scheme representing the outcome of RNA-seq and ChIP-seq read mapping at polymorphic L1 loci. The informative regions are highlighted in beige. (b) Genome browser views of reference (left, *TTC28* locus) and non-reference (right, *NEDD4* locus) L1 instances integrated with RNA-seq (green) and H3K4me3 ChIP-seq data (blue). R1 and R2, replicate #1 and #2, respectively. (c) shRNA-mediated ORF1p knock-down. Top, immunoblot for ORF1p. Bottom, immunoblot for Actin, Tubulin and GAPDH as loading controls. R1, R2, and R3 are independent knock-down replicates performed in parallel and used subsequently for RNA-seq. Relative ORF1p levels normalized by the loading controls and scrambled shRNA controls are indicated between the two membranes. (d) Modified IGV genome browser views (Thorvaldsdóttir et al., 2013) of the *TTC28* (left) and *NEDD4* (right) L1 instances with RNA-seq data upon ORF1p shRNA-mediated knock-down. The informative L1 downstream region is highlighted in beige. Only one biological replicate out of three is shown for the sake of clarity. (e) Heat maps showing RNA-seq read accumulation 1 kb upstream and 1 kb downstream of each L1 copy. The downstream signal on the L1 strand (left heat map) is indicative of L1 sense promoter activity, while the upstream signal on the L1 antisense strand (right heat map) reflects L1 antisense promoter activity. L1 instances (rows) are sorted by decreasing L1 level of expression on the sense strand and the order is identical for the antisense strand. (f) Chromatin and transcription status around expressed (blue, FPKM of downstream RNA-seq tag>0.05) and non-expressed L1HS-Ta instances (pink). The indicated ChIP-seq and RNA-seq signals for each class of L1HS-Ta copies were aggregated and plotted centered around the position of the L1 insertion site. Note that the internal L1 region, when available (reference L1), is not included, but only its flanks. See also Figure 3—figure supplements 1 and 2.

https://doi.org/10.7554/eLife.13926.007

This strategy enabled us to clearly identify individual, expressed copies of full-length L1HS-Ta, including both fixed and polymorphic instances, as exemplified by the two loci depicted in Figure 3b. The first one is a reference L1HS-Ta element integrated in the TTC28 gene (22q12.1). Uniquely mapped RNA-seq tags are detected both in the body of the L1 sequence and in the immediate downstream genomic region. However, read mapping is performed against the reference human genome; thus, reads mapping within the L1 body could originate either from this particular reference L1 copy or from a non-reference L1 copy integrated somewhere else in the genome, and are therefore not informative. The second example is a non-reference L1HS-Ta element inserted in the NEDD4 gene (15q21.3) in opposite orientation. Again, a downstream RNA-seq peak immediately follows the insertion point. Interestingly, in both cases, we do not detect upstream antisense RNA-seq tags at the vicinity of these full-length L1HS-Ta copies, suggesting that sense and antisense L1 promoter activities - or the stability of their respective transcripts - are not necessarily coupled in a chromosomal environment, in agreement with previous results obtained from plasmid borne reporter assays (Macia et al., 2011).

Several lines of evidence confirmed that the downstream RNA-seq peaks emanate from these L1 copies and not from other overlapping genic or non-genic transcripts, or from distinct L1 copies carrying a 3' transduction (see below). First, we observed H3K4me3 ChIP-seq peaks immediately upstream and adjacent to these L1 copies, a histone mark reflecting active or poised promoters (Figure 3a–b) (Bernstein et al., 2012). Although the H3K4me3 signal is expected to be centered on the internal promoter within each L1 copy, a region that is either non-uniquely mappable or not included in the reference genome sequence, the ChIP-seq signal can be readily detected in the flanking genomic sequence. Second, we performed shRNA-mediated knockdown of all L1HS-Ta transcripts, targeting the ORF1 sequence (Figure 3c). Two different ORF1 shRNAs greatly reduced the downstream RNA-seq signal when compared to a scrambled shRNA or to unmanipulated cells (Figure 3d). Together, these data indicate that downstream RNA-seq tags originate from the same transcriptional unit as the considered L1. Retrotransposition of L1 transcripts which include downstream genomic sequences can result in duplication of these sequences at the new insertion site, a phenomenon termed 3’-transduction (Holmes et al., 1994; Moran et al., 1999). Thus, it is possible, in principle, that downstream RNA-seq tags mapping to a particular L1HS-Ta copy could in fact emanate instead from a daughter copy with a 3' transduction, located elsewhere in the genome. However, the concomitant presence of an upstream H3K4me3 mark at many transcriptionally active L1 copies renders this situation very unlikely in most cases.

L1HS-Ta transcription originates from a small number of cell-type-specific loci

To obtain a comprehensive view of the transcriptional activities of individual L1 copies, we applied this integrative approach to all the full-length L1HS-Ta elements identified by ATLAS-seq in MCF7 (Figure 3e). Strikingly, only 5 L1HS-Ta copies show relatively high expression, and approximately 15 more copies exhibit low but detectable levels of expression. In addition, only 4 L1HS-Ta loci show evidence of L1 antisense promoter activity, including instances that are distinct from the few copies expressing high levels of L1 sense transcripts, suggesting uncoupling and differential regulation between these two L1 promoter activities. Consistent with the examples described at the TTC28 and NEDD4 loci (Figure 3b), transcribed L1HS-Ta loci at the genome-wide level have a dual fingerprint with upstream active chromatin marks (H3K4me3 and H3K27ac) and Pol-II and downstream RNA-seq signal (Figure 3f); the latter being reduced upon shRNA-mediated L1 knockdown (Supplementary file 2 and Figure 3—figure supplement 2a). The expression pattern of individual L1HS-Ta loci in standard growth conditions is highly reproducible, as revealed by the clustering of independent RNA-seq experiments obtained from the same cell line grown in two independent laboratory environments (Figure 4a, MCF7_Cristofari and MCF7_ENCODE samples, Pearson correlation r=0.830 p<0.0001), highlighting the non-random nature of this process and the robustness of our overall approach.

Figure 4 with 3 supplements see all

Download asset Open asset

Locus- and cell-type-specific reactivation of individual L1HS-Ta copies in normal and transformed cells.

(a) Heat map displaying expression levels of each L1 instance in each of the analyzed cell lines. Expression level is defined as the number of RNA-seq fragments mapped in a 1 kb-window downstream of a particular L1 copy and on the same strand, normalized by the total amount of mapped fragment (FPKM). Grey, absent polymorphic L1 copy. Most cell lines have at least two RNA-seq replicates (R1 and R2), which cluster based on their L1 expression profiles, showing their cell-line specificity. (b) The bulk of L1HS-Ta transcripts is produced by a limited number of loci. Scatter plot showing the number of L1 copies contributing to half of the total pool of L1HS-Ta transcripts. The y-axis represents the total L1 downstream tag FPKM count for each cell line. The x-axis represents the number of L1 loci contributing to half of this total FPKM. (c) Distribution of expressed and non-expressed L1 insertions in genic and non-genic regions. Bar chart indicating the fraction of L1 copies in genic (dark grey) and non-genic (white) regions with associated pie charts indicating the proportion of non-expressed L1 (light blue) and L1 expressed in at least 1 cell line of the panel (dark blue). The distribution of expressed L1 insertions is not statistically different between genic and non-genic regions (p=0.117, binomial test). (d) Expression levels of genes associated with non-expressed or expressed L1 copies. Values of gene expression are considered independently for each cell line of the panel, and distributed whether the L1 insertion is expressed (>0.05 FPKM) or not (≤0.05 FPKM) in each particular cell line. White oval shows the median; black box lower and upper limits indicate the 25th and 75th percentiles, respectively; whiskers extend to 1.5 times the interquartile range; violin shape represents density estimates of data and extend to extreme values (out of scale range). Genes containing expressed L1s are more expressed than genes containing non-expressed L1s (p<0.001, Kolmogorov-Smirnov test). See also Figure 4—figure supplements 1, 2 and 3.

https://doi.org/10.7554/eLife.13926.010

To expand these observations to a range of cell-types we applied the same RNA-seq-based analysis strategy to the complete panel of cell lines (except MRC-5 and HEK-293 cells for which stranded poly(A)+ RNA-seq data were not available in public databases, Figure 4—figure supplement 1). The identity of the individual L1HS-Ta loci which are expressed, as well as their levels of expression, varies considerably between cell types (Figure 4a and Supplementary file 5). Strikingly, many L1HS-Ta elements which are present at the same genomic location in distinct cell types are differentially expressed, indicating that L1HS-Ta re-activation in transformed cells may result from cell-type- and copy-specific regulation.

As for MCF7, L1HS-Ta expression profiles between biological replicates of all cell lines are similar; and global shRNA-mediated ORF1 knockdown in 2102Ep, followed by RNA-seq, again confirmed the L1-derived origin of downstream transcripts in this cell line (Supplementary file 2 and Figure 3—figure supplement 2b). The total levels of RNA-seq tags 1 kb downstream of L1HS-Ta for a given cell line are highly correlated with the number of internal L1 reads containing the ACA diagnostic nucleotides (Spearman r=0.8169, p<0.0001), reinforcing the idea that RNA-seq tags downstream of L1 can be used as a reliable proxy for L1 sense transcription. RT-PCR validation using primers anchored in the 3' UTR of L1 and in the downstream flanking genomic sequence confirmed the transcriptional state deduced from RNA-seq, for a selection of expressed and non-expressed L1HS-Ta elements (Figure 4—figure supplement 2). Interestingly, for most L1HS-Ta-expressing cells, only 5 to 15 individual copies contribute to the bulk of L1HS-Ta RNA, defined as the number of L1 copy contributing to half of the total FPKM count (Figure 4b). As compared to MCF7 cells, the embryonal carcinoma cells 2102Ep accumulate comparable global levels of L1HS-Ta transcripts, but seem to have a higher number of permissive L1HS-Ta loci, each contributing to a smaller proportion of the total, although the number of active instances still represents a small fraction (<10%) of all L1HS-Ta copies in these cells.

Relationship of expressed L1HS-Ta copies with genes and copy-number variants

To test whether the expression of L1HS-Ta copies could be influenced by their genic environment, we first compared the proportion of expressed L1 insertions in genes as compared to non-expressed copies. Approximately 1/3 of all full-length L1HS-Ta copies were inserted in genes (Figure 4c). Although the proportion of expressed L1s in genes was slightly higher than that of non-expressed copies, the difference was not significant. Then, we focused on the genic cohort of full-length L1HS-Ta. For the latter, we asked whether expressed vs. non-expressed elements were differentially oriented relative to the overlapping genes. Consistent with previous observations (Szak et al., 2002), genic L1 copies are more often found in the antisense orientation (Figure 4—figure supplement 3a). However, this proportion was not significantly different between expressed and non-expressed copies. Independently of their orientation relative to genes, we found that genes containing expressed L1 are often more expressed than those containing non-expressed L1 (Figure 4d), suggesting that highly expressed gene loci might represent a favorable genomic environment for L1 reactivation. Finally, it is conceivable that expressed L1 could be located in larger chromosomal regions having undergone massive amplification. To address this possibility, we looked whether L1 insertions were located in genomic regions showing copy number variations (CNVs). The majority of L1 copies were inserted in normal regions (Figure 4—figure supplement 3b), whether they were expressed or not, and expressed L1 copies were not significantly enriched in amplified regions.

Many transcribed L1HS-Ta are retrotransposition-competent

Finally, we determined whether the top expressed L1HS-Ta elements identified as expressed in the panel of cell lines have the ability to achieve complete retrotransposition cycles and to generate new copies. To answer this question, we combined complementary strategies. First, we collated published data of retrotransposition assays in cultured cells obtained for different L1 instances (Brouha et al., 2003; Beck et al., 2010), and combined them with our own experimental results obtained with an additional newly-identified L1 copy following the same protocol (Figure 5b). Second, we compared our set of highly-expressed L1HS-Ta copies with those which have been identified in earlier studies as mobilization-competent based on the detection of daughter copies with matching 3’-transduced sequences (Tubio et al., 2014). Third, and most directly, we specifically searched for evidence of 3' transduction in our 3' ATLAS-seq data using split reads partly mapping downstream of two distinct L1HS-Ta copies (see Materials and methods and Figure 5c). We found that 5 out of the 20 most highly-expressed L1HS-Ta copies across all cell lines fulfill at least two of these criteria strongly supporting their ability to retrotranspose and 6 additional ones could be identified as progenitors of other, daughter copies. The remaining nine elements could also be retrotransposition-competent, but have not been tested in cultured assays, nor could daughter L1 copies be unambiguously identified. Thus, in total, at least 11 out of the 20 most highly-expressed L1HS-Ta copies across all cell lines are retrotransposition-competent (Figure 5a and Supplementary file 5).

Figure 5

Download asset Open asset

Evidence of retrotransposition capability for selected L1HS-Ta copies.

(a) Evidence of retrotransposition competence for the top 20 most expressed L1 copies across all cell lines analyzed. Cellular assays refer to retrotransposition cellular assays of plasmid-borne L1 instances, whose expression is driven by either the native L1 5’ UTR alone (Brouha et al., 2003) or supplemented by a strong CMV promoter ([Beck et al., 2010] and Figure 5b). These assays measure L1 intrinsic biochemical activity, independently of their actual expression in their genomic context. Three-prime transduction refers to the existence of progeny copies containing a 3' transduction, which can be traced back to the original locus and reflect a retrotransposition event. (b) Retrotransposition assay in cultured cells for MCF7 L1 copy EXP_ID_0447 (*NEDD4* locus). A full length transcribed L1HS-Ta copy present in the genome of MCF7 cells was subcloned by PCR in an expression vector containing a reporter gene to measure retrotransposition activity and generated four independent clones (pVan610-1 to -4). In transfected HeLa cells, *de novo* retrotransposition events of engineered L1 copies lead to the introduction of a functional genomic copy of the neomycin phosphotransferase gene, which expression confers resistance to G418. Resistant foci were stained and counted to monitor retrotransposition activity compared to the positive (pJM101/L1.3, wild type L1HS-Ta) and negative (pJM105/L1.3, mutant L1HS-Ta) control conditions. The value of G418 resistant colonies obtained with the positive control was set to 100%. A picture of a representative well with stained colonies is displayed for illustrative purposes under each bar of the graph. The average value of three biological replicates is displayed with error bars corresponding to the standard deviation among the three biological replicates. (c) Detection of 3' transductions in ATLAS-seq data. This in silico screen identifies L1HS-Ta copy (progeny element) with ATLAS-seq clusters containing reads with non-aligning subsequences (soft-clipped), which uniquely map downstream and adjacent to another full length L1HS locus (progenitor element). The panel shows a genome browser view of such a 3' transduction, originating from a full length L1HS-Ta in the *TTC28* gene (22q12.1). The soft-clipped region of the reads is shown in color (base code: T, red; A, green; C, blue; G; orange). As expected, the transduced region is flanked by 2 poly(A) tails (poly(T) here since it is located on the reverse genomic strand).

https://doi.org/10.7554/eLife.13926.014

Some L1 elements with high retrotransposition activity (‘hot’ L1) belong to well-defined lineages with distinctive 3’ transductions. To evaluate the proportion of the highly expressed elements which belong to such lineages, we screened 3’ ATLAS-seq reads supporting each L1 insertion for sequence tags characteristic of the three most characterized lineages (AC002980, LRE3 and RP, see Methods) (Schwahn et al., 1998; Brouha et al., 2002; Myers et al., 2002; Beck et al., 2010; Macfarlane et al., 2013). We found only 1 insertion (EXP_ID_0447, in the NEDD4 gene) among the 20 most highly expressed L1HS-Ta copies as deriving from the L1_RP transduction family (Supplementary file 5). Nine additional copies were also part of one or another lineage, but were not expressed – or only moderately – in any of the cell lines analyzed. Thus, these findings suggest that the observed high level of expression of a small cohort of L1 insertions is not an intrinsic feature of any previously identified lineage.

Discussion

Altogether our observations support a model (Figure 6) where: (i) L1HS-Ta transcription is predominantly inactive in somatic human cells, including transformed cell lines; however (ii) a small number of L1HS-Ta copies can escape silencing, allowing their expression and transcript accumulation; (iii) the locus in which a particular L1HS copy integrates has a major influence on its ability to be subsequently reactivated; (iv) L1 instances at distinct genomic loci are subject to cell-type dependent activation (potentially dependent on environmental or physiological signals). This model is consistent with previous observations made on few L1 instances suggesting that the transcriptional activity of the L1 promoter might be influenced by its immediate upstream genomic sequence (Lavie et al., 2004).

Figure 6

Download asset Open asset

Schematic model showing the highly locus-specific and variable expression of L1HS-Ta elements among different somatic cell types and individuals.

The colored boxes correspond to L1HS-Ta copies, some being polymorphic (pink). The model is developed in the main text.

https://doi.org/10.7554/eLife.13926.015

L1 retrotransposon expression has been proposed both as a potential biomarker of cancer prognosis and as the starting point for L1-mediated genome instability in tumors (Piskareva et al., 2011; Rodić and Burns, 2013; Rodić et al., 2014). Hence, understanding the means by which L1s can escape regulation in particular cancers is vital in order to improve their rational use as biomarkers or to predict their possible effects on disease progression. Strikingly, our results indicate that – in the cancer cell-lines studied – the general cellular regulation of most L1 instances is unimpaired, even in cells exhibiting high L1 activity. Thus, it appears that shut-off of global L1-regulatory pathways is not a prerequisite for L1 activation in cancer. Instead, we find that only a very limited set of L1 copies, at specific genomic loci, become activated in cancer cells. We note that the presence or absence of polymorphic L1s at these permissive loci, and their degree of transcriptional activation, may represent risk factors for particular cancer types, and more specific biomarkers than global L1 expression or methylation status.

Furthermore, we provide here a unique resource consisting of near-complete maps of L1HS-Ta elements present in widely used normal and transformed model cell lines, for which the availability of genomic datasets is regularly increasing (including several tier-1 & tier-2 cell-lines of the ENCODE project). These maps will be of broad utility in the future to address the impact and regulation of transposable elements in the human genome. Indeed, isolated RNA-seq and ChIP-seq signals resulting from the presence of non-reference L1 copies can only be correctly interpreted if such maps are available. Thus they can act as a platform to fully interpret and profit from the many expanding public datasets generated using the same cell lines.

Recently, whole genome sequencing of human tumors has revealed recurrent retrotransposition events stemming from a handful of source elements (Tubio et al., 2014), consistent with the notion that only a fraction of all full-length L1 elements is actually capable of retrotransposition (Brouha et al., 2003; Beck et al., 2010). Here, we have identified highly heterogeneous expression of individual L1HS-Ta copies, implicating L1 transcriptional activation as a key regulatory process which limits the mutagenic potential of L1 elements independently of their respective intrinsic biochemical activity. Therefore, we extend the concept of retrotransposition-competent L1 copies, previously described as 'hot L1s' (Brouha et al., 2003; Beck et al., 2010), to the transcriptional regulation of each individual locus and cell type. We conclude that L1-mediated mutagenesis results from the reactivation of a small subset of permissive loci, only a fraction of which contains retrotransposition-competent elements, combined with a favorable cellular environment (i.e. diminished restriction factor and/or increased cofactor activities [Pizarro and Cristofari, 2016]). Several of these regulated loci are polymorphic with regards to the presence or absence of an L1 element among the human population, highlighting the role of genetic determinants in the global L1 mutagenic potential in a given individual. Overall, our data suggest that activation of L1 transcription in somatic cells is governed by individual-, locus-, and cell-type-specific determinants and provide a framework to study how distinct L1HS-Ta copies may be regulated by environmental, physiological and pathological triggers. Future work will determine the factors and cellular signaling pathways that contribute to the transcriptional reactivation of the different L1HS-Ta copies in somatic cells.

Share this article

Cite this article

Global expression of L1HS elements in a panel of human somatic cell lines.

The genetic landscape of L1HS-Ta insertional polymorphisms in 12 human somatic cell lines.

Detection of transcriptionally active L1HS-Ta elements at individual copy resolution in MCF7 cells.

Locus- and cell-type-specific reactivation of individual L1HS-Ta copies in normal and transformed cells.

Evidence of retrotransposition capability for selected L1HS-Ta copies.

Schematic model showing the highly locus-specific and variable expression of L1HS-Ta elements among different somatic cell types and individuals.

Author details

Claude Philippe

Contribution

Competing interests

Dulce B Vargas-Landin

Present address

Contribution

Competing interests

Aurélien J Doucet

Contribution

Competing interests

Dominic van Essen

Contribution

Competing interests

Jorge Vera-Otarola

Present address

Contribution

Competing interests

Monika Kuciak

Present address

Contribution

Competing interests

Antoine Corbin

Present address

Contribution

Competing interests

Pilvi Nigumann

Contribution

Competing interests

Gaël Cristofari

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism