Pervasive transcription read-through promotes aberrant expression of oncogenes and RNA chimeras in renal carcinoma

Abstract

Aberrant expression of cancer genes and non-canonical RNA species is a hallmark of cancer. However, the mechanisms driving such atypical gene expression programs are incompletely understood. Here, our transcriptional profiling of a cohort of 50 primary clear cell renal cell carcinoma (ccRCC) samples from The Cancer Genome Atlas (TCGA) reveals that transcription read-through beyond the termination site is a source of transcriptome diversity in cancer cells. Amongst the genes most frequently mutated in ccRCC, we identified SETD2 inactivation as a potent enhancer of transcription read-through. We further show that invasion of neighbouring genes and generation of RNA chimeras are functional outcomes of transcription read-through. We identified the BCL2 oncogene as one of such invaded genes and detected a novel chimera, the CTSC-RAB38, in 20% of ccRCC samples. Collectively, our data highlight a novel link between transcription read-through and aberrant expression of oncogenes and chimeric transcripts that is prevalent in cancer.

https://doi.org/10.7554/eLife.09214.001

eLife digest

Mutations in genes play important roles in many types of cancer. However, mutations alone cannot explain all the biological changes that occur to cancer cells. For example, very few mutations have been linked with a type of kidney cancer called clear cell renal cell carcinoma (or ccRCC for short). Instead, scientists suspect that this cancer is largely caused by changes in the expression of particular genes so that certain cancer-promoting genes are more highly expressed, while other genes that would prevent tumor growth become less active.

One of the few genes that is often mutated in ccRCC is called SETD2. This gene is involved in processes that alter the structure of DNA, but do not alter the genes themselves. These “epigenetic” changes can alter how the instructions in genes are used to make proteins. The first step in making proteins is to use a section of DNA as a template to make molecules of messenger ribonucleic acid (mRNA) in a process called transcription. There are markers within a gene that show where transcription should start and stop to produce the mRNA required to make a particular protein. Epigenetic changes can mask these markers so that the cell produces longer mRNAs that incorporate instructions from neighboring genes.

It was not known how often these stop signs are ignored in ccRCC cells. Here, Grosso et al. compared transcription in normal cells and in ccRCC tumor cells from 50 different patients. The experiments show that more stop signs were ignored in many of the cancer cells, especially in cells with mutations in SETD2. This caused all or parts of neighboring genes to be transcribed along with the target gene and led to changes in the expression levels of these genes. For example, a cancer-promoting gene called BCL2 was more highly expressed in these cells.

Furthermore, some of the mRNA molecules produced in these cancer cells may make “fusion” proteins that combine elements from several proteins. These fusion proteins may work differently to normal cell proteins and therefore might also promote the development of tumors. Grosso et al.’s findings reveal a new link between epigenetic changes and cancer.

https://doi.org/10.7554/eLife.09214.002

Introduction

Clear cell renal cell carcinoma (ccRCC) is the most common histological subtype of renal carcinoma. The genetics of ccRCC is dominated by either somatic or germline inactivating mutations in the VHL gene. Regarding the full spectrum of genomic alterations, ccRCC ranks amongst solid tumors with the lowest average number of point mutations, small indels (Kandoth et al., 2013) and somatic copy number alterations (Zack et al., 2013). These findings suggest that epigenetic events make a significant contribution for the deregulation of the oncogenic and tumor suppressor gene expression programs that drive ccRCC development and progression. In fact, mutations in ccRCC are frequently observed in epigenetic factors such as the chromatin-remodeler PBRM1 and the histone modifying enzymes BAP1 and SETD2, highlighting the central role of epigenetic regulation in this particular cancer (Duns et al., 2010; Varela et al., 2011; Dalgliesh et al., 2010; Creighton et al., 2013). Such mutations in genes that control the epigenome can strongly modulate the landscape of the tumor transcriptome via aberrant expression of global sets of genes. For instance, defects in transcription termination lead to read-through beyond the annotated 3’ gene boundary and have the potential to severely modify the transcriptome and to risk the integrity of vital gene expression programs (Kuehner et al., 2011). Paradoxically, the prevalence and functional outcome of transcription read-through has not been thoroughly scrutinized in any cancer before. Here, we report an unprecedented transcriptional profiling of a cohort of 50 pairs of ccRCC tumor and normal matched samples from The Cancer Genome Atlas (TCGA). We show that transcription read-through is prevalent in ccRCC and found that high levels of transcription read-through correlate with poor survival rates. Amongst the most frequently mutated genes in ccRCC, we identify SETD2 inactivation as a major driving force of impaired transcription termination and high levels of read-through. Moreover, we show that transcription read-through overruns and interferes with the expression of downstream genes. We identify the anti-apoptotic oncogene BCL2 as one of such interfered genes, thereby illustrating a new mechanistic basis for the transcriptional deregulation of oncogenes. In addition, our transcriptome analyses revealed recurrent RNA chimeras generated from read-through episodes in ccRCC. RNA chimeras are common features of cancer cells formerly thought to be produced solely by chromosomal translocations. We now know that many chimeric transcripts can originate from DNA-independent events such as trans-splicing, RNA recombination or transcription read-through (Gingeras, 2009). Our analyses revealed that read-through is a major source of RNA chimeras in ccRCC and identified a novel chimera, the CTSC-RAB38, in 20% of ccRCC tumors, but not in any normal matched sample. Altogether, our data disclose the prognostic power of transcription read-through and emphasizes its role as a major source of transcriptome diversity in ccRCC, namely via aberrant expression of cancer genes and RNA chimeras.

Results

Transcription read-through is frequent in ccRCC

To investigate the prevalence of transcription read-through in ccRCC, we analysed RNA-seq data from 50 pairs of tumor and normal matched samples from TCGA (Cancer Genome Atlas Research N, 2013). Compared to normal tissue, all tumor samples exhibited several genes with transcription termination defects revealed by a high number of reads mapping downstream the transcription termination site (TTS) (Figure 1A,B and Supplementary file 1A). Such accumulation of reads results from transcription read-through beyond the TTS, a surrogate for deficient transcription termination (Higgs et al., 1983). In agreement with a defect in transcription termination, we did not detect differences in read counts on any region upstream the TTS, contrasting with the significant increase in the intergenic region immediately downstream this site (Figure 1C).

Transcription read-through is prevalent in ccRCC.

(A) Top graph depicts the number of genes with transcription read-through per ccRCC sample. The heatmap illustrates the genes with (blue) and without (grey) transcription read-through. The left graph indicates the percentage of samples on which read-through is observed for each individual gene. (n = 50 tumor/matched normal ccRCC TCGA samples). (B) Heatmap representation of the RNA-seq profile distribution and fold change after the TTS region of genes with transcription read-through in one representative TCGA ccRCC sample (patient barcode TCGA-CZ-5465) of a total of 50 tumor and matched pairs analysed. The gene body region was scaled to 60 equally sized bins and ± 4 Kb gene-flanking regions were averaged in 100-bp windows. The left panel shows the read counts (log2 RPKMs) of the matched normal tissue in all genes with read-through and the right panel shows the fold-change (log2) of read counts between the tumor and the matched normal tissue. Genes are ordered according to the read-through length. Scales and colour keys for each panel are depicted in the bottom. (C) Metagene analysis of RNA-seq profiles for tumor and matched normal tissue from one ccRCC patient. *p<0.05 by Student’s T-test.

https://doi.org/10.7554/eLife.09214.003

We then examined whether global deregulation of gene expression at the level of transcription termination affects overall survival rates of ccRCC patients. For that we segregated the TCGA samples into two categories: ‘high read-through’ samples (those with more than 200 genes with read-through) and ‘low read-through’ samples (less than 200 genes with read-through) (Figure 2A). We found that patients with a ‘high read-through’ phenotype died significantly earlier than patients with a ‘low read-through’ phenotype (p = 0.008, log-rank test; Figure 2B).

Transcription read-through correlates with ccRCC survival rates.

(A) The top graph indicates the number of genes with transcription read-through on each ccRCC patient sample. Samples were split in two groups according to the number of genes with transcription read-through (low or high), using 200 genes as a cut-off. The heatmap represents the RNA-seq tumor/matched normal fold change 4 Kb after the TTS region of genes with transcription read-through. (B) Kaplan-Meier plot comparing the survival of patients separated into ‘high read-through’ and ‘low read-through’ subsets as defined in A. (C) Proportion of ccRCC patient samples with low and high transcription read-through. Results are shown for samples containing any of the most recurrently mutated genes in ccRCC: SETD2, VHL, PBRM1, BAP1 and MTOR. Proportions of high and low read-through were significantly different between samples carrying mutation in SETD2 and in any of the remaining genes (Fisher’s Exact Test p<0.05).

https://doi.org/10.7554/eLife.09214.004

To estimate the contribution of the five most frequently mutated genes in ccRCC (Cancer Genome Atlas Research N, 2013) to the observed transcription termination defects, we calculated the percentage of samples carrying any of these mutations that fall within each of the two subsets defined above. Samples with mutations in the histone methyltransferase SETD2 scored preferentially in the ‘high read-through’ category (58%) (Figure 2C). In contrast, samples carrying any of the four other frequently mutated genes in ccRCC segregated equally between both groups (BAP1) or mainly in the ‘low read-through’ category (VHL, PBRM1, MTOR. Figure 2C). Mutations on other genes known to be required for transcription termination or pre-mRNA processing were rare and did not segregate specifically in any category (Supplementary file 1B). These data suggest that widespread transcription read-through is a distinctive hallmark of ccRCC and identify SETD2 mutations as a putative contributing factor for this phenotype.

To further investigate the correlation between SETD2 mutations and transcription read-through in ccRCC, we performed RNA sequencing of 2 SETD2 wild type (wt) and 4 SETD2 mutant ccRCC cell lines previously reported to have a marked reduction of H3K36me3 levels (Duns et al., 2010Carvalho et al., 2014). The wt ccRCC cell line Caki2 showed the highest SETD2 expression levels, comparable with those of a non-cancer kidney epithelial cell line (HEK293) (Figure 3A). For this reason, we chose Caki2 as the reference dataset for the pairwise analyses of the remaining wt (Caki1) and mutant (MF, AB, ER, FG2) samples. These analyses revealed transcription read-through in hundreds of genes on all ccRCC cell lines, with a significantly higher incidence in all SETD2 mutant cells (Figure 3B and Supplementary file 1C). A metagene analysis of genes with transcription termination defects in SETD2 mutant cells revealed that expression levels vary significantly downstream the TTS, but not within the gene body or at the promoter region (Figure 3C). Moreover, heatmaps of the fold change of read counts between different cell lines depict a strong increase in the number of read-through reads in SETD2 mutant cells (Figure 3D and Figure 3—figure supplement 1). To rule out the contribution of different genetic backgrounds in these cell lines, we interrogated RNA-seq data from: SETD2 knockout (KO) ccRCC cells (Ho et al., 2015); SETD2-depleted human mesenchymal stem (MS) cells (Luco et al., 2010); and embryonic stem (ES) cells from Setd2 KO mice (Zhang et al., 2014). Again, when compared to wt cells, SETD2-deficient cells displayed increased read counts specifically in the region immediately downstream the TTS consistent with transcription read-through events (Figure 3E,F; Figure 3—figure supplement 2 and Supplementary file 1D,E,F). Similar results were obtained upon RT-qPCR measurements of transcription read-through in three distinct genes 48 hr after depletion of SETD2 from wt ccRCC cells by RNA interference (Figure 3—figure supplement 3)

Figure 3 with 3 supplements see all
SETD2 mutations promote transcription read-through in ccRCC.

(A) SETD2 expression levels (FPKMs) in HEK293 and ccRCC cell lines. (B) Number of genes with transcription read-through up to 4 Kb downstream the TTS. *p<0.05 by Fisher’s Exact Test after comparing each SETD2 mutant cell line with the SETD2 wt cell line (Caki1); (C) Metagene analysis of genes showing transcription read-through in SETD2 mutant and wt ccRCC cell lines. The gene body region was scaled to 60 equally sized bins and ± 4 Kb gene-flanking regions were averaged in 100-bp windows. *p<0.05 by Student’s T-test; (D) Heatmap representation of RNA-seq profile distribution and fold change after the TTS region of genes showing transcription read-through. Genes were scaled and averaged as in C. The left panel shows the read counts (log2 RPKMs) of the SETD2 wt cell line (Caki2) in all genes with read-through. The two right panels show the fold-change (log2) between each ccRCC cell line and the reference SETD2 wt ccRCC cell line (Caki2). Genes are ordered according to the read-through length. Scales and colour keys for each panel are depicted at the bottom of the panel. (E) Metagene analysis (as detailed in C) of genes showing transcription read-through in SETD2 KO and wt 786-O ccRCC cells. *p<0.05 by Student’s T-test. (F) Analysis as described in D of RNA-seq data from SETD2 KO and control 786-O ccRCC cell lines.

https://doi.org/10.7554/eLife.09214.005

We then investigated whether wt SETD2 can rescue the transcription termination defects observed in SETD2 mutant ccRCC cells. For that, we performed single-molecule RNA FISH after transient expression of GFP-tagged SETD2 in a SETD2 mutant ccRCC cell line (Figure 4A). RNAs produced upon transcription read-through of MRPL23 and SEL1L3 were visualized as foci obtained with RNA FISH probes targeting a region downstream the termination site. These foci were present in significantly higher number in GFP-negative cells than in wt SETD2-GFP expressing cells (Figure 4B). This result reveals that expression of wt SETD2 is sufficient to revert the transcription termination defects observed in mutant cells. Altogether, these data support the view that SETD2 is necessary to guide correct transcription termination genome-wide and that mutations in this histone modifier gene cause aberrant transcription patterns in ccRCC.

SETD2 rescues the transcription termination defects of SETD2 mutant ccRCC cells.

(A) RNA FISH experiments on a SETD2-mutant ccRCC cell line (FG2) transiently expressing wt SETD2-GFP. Quasar570-labeled probes were designed against a region downstream the termination sites of MRPL23 and SEL1L3. The arrows indicate single RNA transcripts generated by a transcription read-through event. (B) Quantification of the number of GFP-negative and GFP-positive cells containing more than 5 RNA FISH foci. Means and standard deviations from at least 50 cells from four individual experiments are shown. Scale bars: 10 μm. p<0.01 by Student’s T-test.

https://doi.org/10.7554/eLife.09214.009

Transcription read-through interferes with the expression of neighbouring genes

One possible functional consequence of aberrant transcription read-through is the invasion of adjacent downstream genes altering their expression levels. This trans-acting transcriptional interference mechanism may play important roles in cancer development and progression by deregulating the expression of relevant oncogenes and tumor suppressors (Proudfoot, 1986). To test whether overrunning of neighbouring genes is a frequent outcome of transcription read-through in ccRCC, we analysed the expression levels of the entire intergenic region and of the gene located immediately downstream. In agreement with our prediction, there was a statistically significant increase in read counts along the intergenic region and within the body of the downstream gene of a tandem pair (Figure 5A). This difference was still detected after the TTS of the downstream gene, but not within the upstream gene (Figure 5B). Overall, reading-through RNA polymerase II (RNAPII) molecules invaded an average of 20% of genes located downstream (Supplementary file 1A).

Transcription read-through overruns and interferes with the expression of neighbouring genes.

Heatmap (A) and metagene (B) profiles of the intergenic region and of the gene located downstream of a read-through event in one representative ccRCC TCGA sample (patient barcode TCGA-CZ-5465). Genes were scaled and averaged as in Figure 1. *p<0.05 by Student’s T-test.

https://doi.org/10.7554/eLife.09214.010

We then asked if read-through levels correlate with the expression levels of the downstream gene across the TCGA dataset. Notably, expression of 52 out of 903 genes (6%) exhibited a statistically significant correlation with read-through levels of the upstream gene (Figure 6A, Supplementary file 1G). From these, 51 genes were positively correlated and only one gene showed a negative correlation. Amongst positively correlated genes we found the anti-apoptotic oncogene BCL2. In support of its oncogenic role in kidney cancer, depletion of BCL2 with antisense oligonucleotides inhibits ccRCC tumor growth in vitro and in vivo (Uchida et al., 2001). We further observed that expression of BCL2 is frequently increased in the tumor samples of the TCGA dataset when compared to the matched samples (Figure 6B). Importantly, augmented levels of BCL2 mRNA and protein correlated positively with the levels of transcription read-through of the KDSR gene located immediately upstream (Figure 6B,C). Altogether, these data suggest that transcription termination defects interfere with the expression of neighbouring genes and illustrate a new paradigm for the aberrant expression of cancer-related genes, which may explain the upregulation of the BCL2 oncogene in ccRCC.

Transcription read-through of the KDSR gene correlates with the expression of the BCL2 oncogene.

(A) Distribution of the correlation between the read-through and the expression levels of the downstream tandem genes. Significant correlation values (Benjamini-Hochberg adjusted p<0.05) are represented in blue. (B) Correlation between the expression levels of BCL2 and the read-through of the upstream KDSR gene. The graph depicts the fold-change of read counts for each tumor and matched normal pair. (C) Correlation between BCL2 protein levels and the KDSR read-through.

https://doi.org/10.7554/eLife.09214.011

Transcription read-through is a source of RNA chimeras in ccRCC

RNAPII elongation beyond the annotated gene boundaries and invasion of an adjacent gene as a result of impaired transcription termination may result in the formation of hybrid transcripts collectively called RNA chimeras (Gingeras, 2009). In addition to gene fusions (formed upon chromosomal translocations) RNA chimeras can originate from DNA-independent events such as trans-splicing, RNA recombination or transcriptional read-through (Gingeras, 2009). Our analysis of ccRCC RNA-seq data revealed a high number of reads mapping at two distinct tandem genes, which are indicative of RNA chimeras generated by intergenic splicing following a read-through event (Figure 7A, Supplementary file 1H). Analysis of the splicing pattern of these chimeras showed that most events join the second-last exon of the upstream gene with the second exon of the downstream gene (Figure 7B). This pattern is compatible with the exon definition model according to which transcription termination is necessary for the selection of the terminal 3’ splice site (ss) (Niwa et al., 1992Dye and Proudfoot, 1999). In the absence of transcription termination, the terminal 3’ss is evicted and the terminal 5’ss will splice together with the first 3’ss of the downstream gene, which emerges from the nascent transcript once RNAPII reaches the second exon (Figure 7B). Interestingly, the number of RNA chimeras detected correlated positively with the levels of transcription read-through (p = 0.003; R = 0.41) and was significantly higher in the sample group with a ‘high read-through’ phenotype defined above (Figure 7C).

Read-through RNA chimeras are prevalent in ccRCC.

(A) Circos plots showing the location of genes forming each RNA chimera detected on human ccRCC cell lines and TCGA samples. Chimeras are represented by curves inside the Circos. (B) Number of read-through RNA chimeras formed by intergenic splicing between the represented exons. (C) Number of read-through RNA chimeras in the low and high read-through sample subsets defined in Figure 2A.

https://doi.org/10.7554/eLife.09214.012

A remarkable feature of these RNA chimeras is that some were recurrently detected in different tumor samples. One particular chimera, encoded by two consecutive genes - CTSC and RAB38 - was detected in 20% of the TCGA samples (but not in any matched normal sample) and in four of the six ccRCC cell lines that we sequenced de novo (Supplementary file 1H). We validated this RNA chimera by RT-qPCR before and after transfection of the ccRCC cell lines with a small interfering RNA (siRNA) spanning the transcript break-point, which resulted in a robust depletion of CTSC-RAB38 (Figure 8A,B). In contrast, siRNAs targeting either the last exon of CTSC or the first exon of RAB38 (which are not included in the chimeric transcript) significantly decreased the levels of CTSC and RAB38, respectively, but not the levels of the CTSC-RAB38 chimera (Figure 8B,C). Moreover, we measured the RNAPII occupancy throughout the intergenic region and within the body of the CTSC and RAB38 genes by chromatin immunoprecipitation (ChIP). We detected a robust occupancy of RNAPII throughout the intergenic region (Figure 8D), which further suggests that the CTSC-RAB38 chimera is generated following a read-through episode without any genomic deletion or translocation involved.

The CTSC-RAB38 chimera is recurrently detected in ccRCC.

(A) Schematic illustration of the CTSC-RAB38 locus depicting the position of the primers used to measure the transcripts levels by RT-qPCR shown in (B) and (C) and the position of the siRNAs targeting each of the three transcripts (CTSC, RAB38 and the CTSC-RAB38 chimera). The dashed curve in the scheme illustrates the splicing pattern of the chimeric transcript. (B) Relative expression of the CTSC-RAB38 chimeric transcript after depletion of the indicated transcripts by RNAi in three distinct ccRCC cell lines (FG2, MF, ER). (C) Relative expression of CTSC and RAB38 upon depletion of the indicated transcripts by RNAi in FG2 cells. Similar results were obtained with the other ccRCC cell lines. *p<0.05 by Student’s T-test compared to controls. (D) RNAPII ChIP along the CTSC-RAB38 locus in FG2 cells. Means and standard deviations from five independent experiments are shown. *p<0.05 by Student’s T-test compared to the gene desert.

https://doi.org/10.7554/eLife.09214.013

Discussion

Our transcriptome analysis of a large dataset of tumor and normal matched samples revealed that transcription events extending beyond the annotated 3’ end of genes are frequent in ccRCC. Strikingly, our analysis further disclosed an unexpected prognostic power of transcription read-through in kidney cancer: higher number of genes with transcription read-through correlates significantly with poorer patient survival. Amongst the most frequently mutated genes in ccRCC, we identified SETD2 inactivation as a contributing factor for increased transcription read-through. In fact, ectopic expression of SETD2 was sufficient to rescue the transcription termination defects of SETD2 mutant ccRCC cells. Moreover, we report that the effects of impaired transcription termination are not confined to the affected gene. Instead, it also contaminates the expression of neighboring genes that are overran by reading-through RNAPII complexes. Importantly, amongst the genes whose expression directly correlates with the volume of transcription read-through on the upstream gene, we detected the anti-apoptotic oncogene BCL2. This finding unveils a new source of aberrant expression of cancer-related genes and provides a plausible mechanistic basis for the upregulation of BCL2 frequently observed in ccRCC.

Our present study further reveals recurrent RNA chimeras in ccRCC combining sequences from two tandem genes. Such chimeras are generated following extensions of RNAPII beyond the annotated gene boundaries and invasion of an adjacent gene as a result of impaired transcription termination. According to the exon definition model (Niwa et al., 1992Dye and Proudfoot, 1999), deficient transcription termination is expected to impair the splicing of the last exon. In agreement, most chimeras skip the last exon of the upstream gene and the prevailing splicing pattern joins the second-last exon of the upstream gene with the second exon of the downstream gene. The finding that several of these RNA chimeras are recurrently detected in two or more tumor samples, suggests that they were selected during cancer development and supports the exciting possibility that they may play relevant functional roles.

Although our study primarily focused on the characterization of the transcription termination defects and their impact on the ccRCC transcriptome, our pioneer findings raise intriguing questions such as: which mechanism(s) drive such transcription defects?; how individual RNA chimeras are functionally involved in tumorigenesis?; do chimeric transcripts produce functional oncoproteins and/or can they directly affect the expression of other relevant cancer genes?; how can we intervene therapeutically to restore the canonical transcription pattern? The widespread incidence of aberrant termination in ccRCC cells suggests that it may play important roles in expanding the transcriptome diversity that drives cancer development and progression. The impact on BCL2 expression illustrates a relevant functional outcome of transcription read-though. The generation of chimeric transcripts, namely those recurrently identified in several samples, such as CTSC-RAB38, further discloses the contribution of impaired transcription termination for the expansion of RNA species that may favor cancer progression. Nevertheless, future studies should provide direct evidence that a chimeric transcript has oncogenic functions in order to support the physiological relevance of these RNAs. This is a challenging and very exciting topic and further efforts are required to fully elucidate the impact of impaired transcription termination on cancer in general and on ccRCC in particular.

Materials and methods

Cell culture

Request a detailed protocol

ccRCC cells (Caki-1, Caki-2, MF, ER, AB and FG2, Cell Line Services Eppelheim, Germany) were grown as monolayers in Dulbecco’s modified Eagle medium (DMEM, Invitrogen, CA, USA), supplemented with 10% (v/v) FBS, 1% (v/v) nonessential amino acids, 1% (v/v) L-glutamine and 100U/ml penicillin-streptomycin, and maintained at 37°C in a humidified atmosphere with 5% CO2.

RNA interference

Request a detailed protocol

RNAi was achieved using synthetic siRNA duplexes (Eurogentec, Belgium). The sequence of the siRNAs is shown in Supplementary file 1I. siRNAs targeting the firefly luciferase (GL2) were used as controls. Cells were reverse transfected with 10 μM siRNAs using OptiMEM (Invitrogen) and Lipofectamine RNAiMAX (Invitrogen), according to the manufacturer’s instructions. 24 hr after the first transfection, cells were re-transfected with the same siRNA duplexes and transfection reagents and harvested on the following day.

RNA isolation and quantitative RT–PCR

Request a detailed protocol

Total RNA was extracted with TRIzol (Invitrogen). cDNA was made using Superscript II Reverse Transcriptase (Invitrogen). RT-qPCR was performed in the ViiA Real Time PCR (Applied Biosystems, CA, USA), using SYBR Green PCR master mix (Applied Biosystems). The relative RNA expression was estimated as follows: 2^(Ct reference – Ct sample), where Ct reference and Ct sample are mean threshold cycles of RT-qPCR done in duplicate on cDNA samples from U6 snRNA (reference) and the cDNA from the genes of interest (sample). All primer sequences are presented in Supplementary file 1I.

Chromatin immunoprecipitation

Request a detailed protocol

ChIP was performed as described (de Almeida et al., 2011). The relative occupancy of RNAPII at each locus was estimated by RT-qPCR as follows: 2^(Ct Input – Ct IP), where Ct Input and Ct IP are mean threshold cycles of RT-qPCR done in duplicate on DNA samples from input and specific immunoprecipitations, respectively. RNAPII was precipitated with an antibody against its largest subunit (N20, sc-899; Santa Cruz, TX, USA). The sequences of gene-specific, intergenic regions and gene desert primer pairs are presented in Supplementary file 1I.

RNA fluorescence in situ hybridization (FISH)

Request a detailed protocol

FG2 ccRCC cells were transiently transfected with a wt SETD2-GFP expression plasmid (Carvalho et al., 2014) and cultured on glass coverslip for 24 hr before hybridization with RNA FISH probes (Biosearch Technologies, CA, USA) following the manufacturer’s protocol. Briefly, cells were washed with PBS, fixed for 10 min at room temperature, washed twice with PBS, and permeabilized at 4°C in 70% (vol/vol) EtOH. Probes diluted in hybridization buffer were added to permeabilized cells before overnight incubation in a dark chamber at 37°C. After washing, DAPI was added to stain the nuclei. Epi-fluorescence microscopy was performed using a Zeiss Z1 microscope equipped with Z-piezo (Prior, MA, USA), a 63x 1.4 NA Plan-Apochromat objective and a sCMOS camera (Hamamatsu Flash 4.0). FISH probes were designed to target a segment of the RNA transcript encoded by the intergenic region downstream of the canonical termination sites of MRPL23 and SEL1L3. The sequences of the probes are shown in Supplementary file 1J.

RNA-seq datasets and preprocessing

Request a detailed protocol

Samples were barcoded and prepared for sequencing by the Centro Nacional de Análisis Genómico (CNAG, Barcelona, Spain) using Illumina protocols. PolyA+ RNA-seq libraries of ccRCC cell lines were sequenced as paired-end 75-bp sequence tags using the standard Illumina pipeline. ccRCC RNA-seq datasets (with at least 40 million mapped reads on each tumor and matched normal samples – in a total of 50 paired-samples listed in Supplementary file 1A) were obtained from TCGA. HEK293 RNA-seq data were from the Sequence Read Archive (SRX876600). RNA-seq data from SETD2 KO ccRCC cells (786-O), SETD2-depleted human MS cells and Setd2 KO mouse ES cells were obtained from the GEO (http://www.ncbi.nlm.nih.gov/geo/ GSE66879, GSE19373 and GSE54932, respectively). Data quality was assessed with the FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) quality-control tool for high throughput sequence data. Sequence tags were then mapped to the reference human (hg19) or mouse (mm9) genomes with TopHat software using default parameters (Kim et al., 2013). Reads from samples with multiple sequencing lanes were merged and only the best score alignment was accepted for each read.

Transcriptome alterations

Request a detailed protocol

Gene annotations were obtained from UCSC knownGene and refGene tables (Karolchik et al., 2014) and merged into a single transcript model per gene using BedTools (Quinlan and Hall, 2010). Our analysis was restricted to transcriptionally active genes defined as those with expression levels higher than the 25th percentile. To identify genes with transcriptional read-through, we filtered out all genes for which there was another annotated gene in either strand within a region of 5 Kb downstream of their TTS. We also filtered out genes with an overall increase in expression level relative to the control sample. Reads were counted in 100 bp windows for the 4 Kb region downstream of the TTS and normalized for the total number of mapped reads on each sample (RPKMs) (Mortazavi et al., 2008). We considered the occurrence of transcriptional read-through when more than six 100 bp windows showed increased (at least 1.5 fold-change) RPKMs relative to control. The control samples were: Caki2 for ccRCC cell lines; the matched normal tissue sample for each ccRCC patient; control 786-O ccRCC cells for SETD2 KO 786-O cells; human MS cells for SETD2-depleted human MS cells; and wt mouse ES cells for the Setd2 KO mouse. In the ccRCC cell lines dataset, all the analysis were also performed comparing both SETD2 wt (Caki1 vs Caki2), with this comparison working as a negative control. Statistical significance of the differences between proportions of genes showing read-through was assessed using the Fisher’s exact test. RNA chimeras were detected using the fusion-search option (fusion-min-dist set to 100 bp) in TopHat alignment (Kim et al., 2013) and TopHat-Fusion (Kim and Salzberg, 2011) with default parameters. For downstream analysis we only considered RNA chimeras supported by at least two reads. A set of in-house scripts were written in bash and in the R environmental language (http://www.R-project.org/) (Team RDC, 2011).

Graphical representation of data

Request a detailed protocol

Figures were produced using BedTools (Quinlan and Hall, 2010) and default packages from the R environment. To produce heatmaps and metagene average profiles showing transcription read-through, genes were scaled to 60 equally sized bins so that all annotated TSSs and TTSs were aligned. Regions 4 Kb upstream of TSSs and 4 Kb downstream of TTSs were averaged in a 100-bp window. Individual gene profiles were produced by successions of 10-bp windows (single gene) or 100-bp windows (two genes). All read counts were normalized by genomic region length and number of mapped reads (RPKM), and RPKM values were log2 transformed when representing multiple genes. Protein levels assessed by reverse phase protein array (RPPA) were gathered from the TCGA portal. Circle plots showing RNA chimeras distribution were produced using Circos (Krzywinski et al., 2009).

Accession codes for RNA-seq data

Request a detailed protocol

The RNA-seq data for ccRCC cell lines have been deposited in Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) database under the accession number GSE64451. ccRCC RNA-seq datasets were obtained from TCGA. HEK293 RNA-seq data were from the Sequence Read Archive (SRX876600). RNA-seq data from SETD2 KO 786-O ccRCC cells, human mesenchymal stem cells and mouse embryonic stem cells were obtained from the GEO (GSE66879, GSE19373 and GSE54932, respectively).

Data availability

The following data sets were generated
The following previously published data sets were used
    1. Ho TH
    2. Nie J
    3. Yan H
    (2015) RNA sequencing of SETD2 isogenic renal cell carcinoma cell lines
    Publicly available at the NCBI Gene Expression Omnibus (Accession no: GSE66879).

References

    1. Creighton CJ
    2. Morgan M
    3. Gunaratne PH
    4. Wheeler DA
    5. Gibbs RA
    6. Gordon Robertson A
    7. Chu A
    8. Beroukhim R
    9. Cibulskis K
    10. Signoretti S
    11. Vandin Hsin-Ta Wu F
    12. Raphael BJ
    13. Verhaak RGW
    14. Tamboli P
    15. Torres-Garcia W
    16. Akbani R
    17. Weinstein JN
    18. Reuter V
    19. Hsieh JJ
    20. Rose Brannon A
    21. Ari Hakimi A
    22. Jacobsen A
    23. Ciriello G
    24. Reva B
    25. Ricketts CJ
    26. Marston Linehan W
    27. Stuart JM
    28. Kimryn Rathmell W
    29. Shen H
    30. Laird PW
    31. Muzny D
    32. Davis C
    33. Morgan M
    34. Xi L
    35. Chang K
    36. Kakkar N
    37. Treviño LR
    38. Benton S
    39. Reid JG
    40. Morton D
    41. Doddapaneni H
    42. Han Y
    43. Lewis L
    44. Dinh H
    45. Kovar C
    46. Zhu Y
    47. Santibanez J
    48. Wang M
    49. Hale W
    50. Kalra D
    51. Creighton CJ
    52. Wheeler DA
    53. Gibbs RA
    54. Getz G
    55. Cibulskis K
    56. Lawrence MS
    57. Sougnez C
    58. Carter SL
    59. Sivachenko A
    60. Lichtenstein L
    61. Stewart C
    62. Voet D
    63. Fisher  Sheila
    64. Gabriel  Stacey B.
    65. Lander E
    66. Beroukhim R
    67. Schumacher  Steve E.
    68. Tabak B
    69. Saksena  Gordon
    70. Onofrio RC
    71. Carter SL
    72. Cherniack AD
    73. Gentry  Jeff
    74. Ardlie K
    75. Sougnez  Carrie
    76. Getz G
    77. Gabriel SB
    78. Meyerson M
    79. Gordon Robertson A
    80. Chu A
    81. Chun H-JE
    82. Mungall AJ
    83. Sipahimalani P
    84. Stoll D
    85. Ally A
    86. Balasundaram M
    87. Butterfield YSN
    88. Carlsen R
    89. Carter C
    90. Chuah E
    91. Coope RJN
    92. Dhalla N
    93. Gorski S
    94. Guin R
    95. Hirst C
    96. Hirst M
    97. Holt RA
    98. Lebovitz C
    99. Lee D
    100. Li HI
    101. Mayo M
    102. Moore RA
    103. Pleasance E
    104. Plettner P
    105. Schein JE
    106. Shafiei A
    107. Slobodan JR
    108. Tam A
    109. Thiessen N
    110. Varhol RJ
    111. Wye N
    112. Zhao Y
    113. Birol I
    114. Jones SJM
    115. Marra MA
    116. Auman JT
    117. Tan D
    118. Jones CD
    119. Hoadley KA
    120. Mieczkowski PA
    121. Mose LE
    122. Jefferys SR
    123. Topal MD
    124. Liquori C
    125. Turman YJ
    126. Shi Y
    127. Waring S
    128. Buda E
    129. Walsh J
    130. Wu J
    131. Bodenheimer T
    132. Hoyle AP
    133. Simons JV
    134. Soloway MG
    135. Balu S
    136. Parker JS
    137. Neil Hayes D
    138. Perou CM
    139. Kucherlapati R
    140. Park P
    141. Shen H
    142. Triche Jr T
    143. Weisenberger DJ
    144. Lai PH
    145. Bootwalla MS
    146. Maglinte DT
    147. Mahurkar S
    148. Berman BP
    149. Van Den Berg DJ
    150. Cope L
    151. Baylin SB
    152. Laird PW
    153. Creighton CJ
    154. Wheeler DA
    155. Getz G
    156. Noble MS
    157. DiCara D
    158. Zhang H
    159. Cho J
    160. Heiman DI
    161. Gehlenborg N
    162. Voet D
    163. Mallard W
    164. Lin P
    165. Frazer S
    166. Stojanov P
    167. Liu Y
    168. Zhou L
    169. Kim J
    170. Lawrence MS
    171. Chin L
    172. Vandin F
    173. Wu H-T
    174. Raphael BJ
    175. Benz C
    176. Yau C
    177. Reynolds SM
    178. Shmulevich I
    179. Verhaak RGW
    180. Torres-Garcia W
    181. Vegesna R
    182. Kim H
    183. Zhang W
    184. Cogdell D
    185. Jonasch E
    186. Ding Z
    187. Lu Y
    188. Akbani R
    189. Zhang N
    190. Unruh AK
    191. Casasent TD
    192. Wakefield C
    193. Tsavachidou D
    194. Chin L
    195. Mills GB
    196. Weinstein JN
    197. Jacobsen A
    198. Rose Brannon A
    199. Ciriello G
    200. Schultz N
    201. Ari Hakimi A
    202. Reva B
    203. Antipin Y
    204. Gao J
    205. Cerami E
    206. Gross B
    207. Arman Aksoy B
    208. Sinha R
    209. Weinhold N
    210. Onur Sumer S
    211. Taylor BS
    212. Shen R
    213. Ostrovnaya I
    214. Hsieh JJ
    215. Berger MF
    216. Ladanyi M
    217. Sander C
    218. Fei SS
    219. Stout A
    220. Spellman PT
    221. Rubin DL
    222. Liu TT
    223. Stuart JM
    224. Ng S
    225. Paull EO
    226. Carlin D
    227. Goldstein T
    228. Waltman P
    229. Ellrott K
    230. Zhu J
    231. Haussler D
    232. Gunaratne PH
    233. Xiao W
    234. Shelton C
    235. Gardner J
    236. Penny R
    237. Sherman M
    238. Mallery D
    239. Morris S
    240. Paulauskis J
    241. Burnett K
    242. Shelton T
    243. Signoretti S
    244. Kaelin WG
    245. Choueiri T
    246. Atkins MB
    247. Penny R
    248. Burnett K
    249. Mallery D
    250. Curley E
    251. Tickoo S
    252. Reuter V
    253. Kimryn Rathmell W
    254. Thorne L
    255. Boice L
    256. Huang M
    257. Fisher JC
    258. Marston Linehan W
    259. Vocke CD
    260. Peterson J
    261. Worrell R
    262. Merino MJ
    263. Schmidt LS
    264. Tamboli P
    265. Czerniak BA
    266. Aldape KD
    267. Wood CG
    268. Boyd J
    269. Weaver J
    270. Iacocca MV
    271. Petrelli N
    272. Witkin G
    273. Brown J
    274. Czerwinski C
    275. Huelsenbeck-Dill L
    276. Rabeno B
    277. Myers J
    278. Morrison C
    279. Bergsten J
    280. Eckman J
    281. Harr J
    282. Smith C
    283. Tucker K
    284. Anne Zach L
    285. Bshara W
    286. Gaudioso C
    287. Morrison C
    288. Dhir R
    289. Maranchie J
    290. Nelson J
    291. Parwani A
    292. Potapova O
    293. Fedosenko K
    294. Cheville JC
    295. Houston Thompson R
    296. Signoretti S
    297. Kaelin WG
    298. Atkins MB
    299. Tickoo S
    300. Reuter V
    301. Marston Linehan W
    302. Vocke CD
    303. Peterson J
    304. Merino MJ
    305. Schmidt LS
    306. Tamboli P
    307. Mosquera JM
    308. Rubin MA
    309. Blute ML
    310. Kimryn Rathmell W
    311. Pihl T
    312. Jensen M
    313. Sfeir R
    314. Kahn A
    315. Chu A
    316. Kothiyal P
    317. Snyder E
    318. Pontius J
    319. Ayala B
    320. Backus M
    321. Walton J
    322. Baboud J
    323. Berton D
    324. Nicholls M
    325. Srinivasan D
    326. Raman R
    327. Girshik S
    328. Kigonya P
    329. Alonso S
    330. Sanbhadti R
    331. Barletta S
    332. Pot D
    333. Sheth M
    334. Demchok JA
    335. Davidsen T
    336. Wang Z
    337. Yang L
    338. Tarnuzzer RW
    339. Zhang J
    340. Eley G
    341. Ferguson ML
    342. Mills Shaw KR
    343. Guyer MS
    344. Ozenberger BA
    345. Sofia. HJ
    346. The Cancer Genome Atlas Research Network AwgBCoM
    (2013) Comprehensive molecular characterization of clear cell renal cell carcinoma
    Nature 499:43–49.
    https://doi.org/10.1038/nature12222
  1. Book
    1. Team RDC
    (2011)
    R: A Language and Environment for Statistical Computing
    Vienna, Austria: R Foundation for Statistical Computing.

Article and author information

Author details

  1. Ana R Grosso

    Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
    Contribution
    ARG, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    agrosso@medicina.ulisboa.pt
    Competing interests
    The authors declare that no competing interests exist.
  2. Ana P Leite

    Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
    Contribution
    APL, Conception and design, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5773-3211
  3. Sílvia Carvalho

    Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
    Contribution
    SC, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  4. Mafalda R Matos

    Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
    Contribution
    MRM, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  5. Filipa B Martins

    Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
    Contribution
    FBM, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  6. Alexandra C Vítor

    Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
    Contribution
    ACV, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  7. Joana MP Desterro

    Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
    Contribution
    JMPD, Acquisition of data, Analysis and interpretation of data, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  8. Maria Carmo-Fonseca

    Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
    Contribution
    MCF, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  9. Sérgio F de Almeida

    Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisboa, Portugal
    Contribution
    SFDA, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    sergioalmeida@fm.ul.pt
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7774-1355

Funding

Fundação para a Ciência e a Tecnologia (PTDC/BIM-ONC/0384-2012)

  • Sérgio F de Almeida

Fundação para a Ciência e a Tecnologia (SFRH/BD/92208/2013)

  • Mafalda R Matos

Fundação para a Ciência e a Tecnologia (SFRH/BD/52232/2013)

  • Alexandra C Vítor

Fundação para a Ciência e a Tecnologia (IF/00510/2014)

  • Ana R Grosso

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank our colleagues Nuno Barbosa-Morais, Sérgio Dias and Edgar Gomes for critical comments and suggestions. We also thank Ioana Posa, Mafalda Pimentel, Célia Carvalho and the Bioimaging Unit of the IMM for technical assistance. This work was supported by Fundação para a Ciência e Tecnologia (FCT), Portugal (PTDC/BIM-ONC/0384-2012 to SFdA). MRM is a FCT PhD fellow (SFRH/BD/92208/2013). ACV is a Lisbon BioMed PhD fellow funded by FCT (SFRH/BD/52232/2013). ARG is the recipient of a FCT Investigator award (IF/00510/2014).

Version history

  1. Received: June 3, 2015
  2. Accepted: November 16, 2015
  3. Accepted Manuscript published: November 17, 2015 (version 1)
  4. Version of Record published: January 27, 2016 (version 2)

Copyright

© 2015, Grosso et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 5,284
    Page views
  • 956
    Downloads
  • 91
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ana R Grosso
  2. Ana P Leite
  3. Sílvia Carvalho
  4. Mafalda R Matos
  5. Filipa B Martins
  6. Alexandra C Vítor
  7. Joana MP Desterro
  8. Maria Carmo-Fonseca
  9. Sérgio F de Almeida
(2015)
Pervasive transcription read-through promotes aberrant expression of oncogenes and RNA chimeras in renal carcinoma
eLife 4:e09214.
https://doi.org/10.7554/eLife.09214

Share this article

https://doi.org/10.7554/eLife.09214

Further reading

    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Erandi Velazquez-Miranda, Ming He
    Insight

    Endothelial cell subpopulations are characterized by unique gene expression profiles, epigenetic landscapes and functional properties.

    1. Cell Biology
    2. Chromosomes and Gene Expression
    Monica Salinas-Pena, Elena Rebollo, Albert Jordan
    Research Article

    Histone H1 participates in chromatin condensation and regulates nuclear processes. Human somatic cells may contain up to seven histone H1 variants, although their functional heterogeneity is not fully understood. Here, we have profiled the differential nuclear distribution of the somatic H1 repertoire in human cells through imaging techniques including super-resolution microscopy. H1 variants exhibit characteristic distribution patterns in both interphase and mitosis. H1.2, H1.3, and H1.5 are universally enriched at the nuclear periphery in all cell lines analyzed and co-localize with compacted DNA. H1.0 shows a less pronounced peripheral localization, with apparent variability among different cell lines. On the other hand, H1.4 and H1X are distributed throughout the nucleus, being H1X universally enriched in high-GC regions and abundant in the nucleoli. Interestingly, H1.4 and H1.0 show a more peripheral distribution in cell lines lacking H1.3 and H1.5. The differential distribution patterns of H1 suggest specific functionalities in organizing lamina-associated domains or nucleolar activity, which is further supported by a distinct response of H1X or phosphorylated H1.4 to the inhibition of ribosomal DNA transcription. Moreover, H1 variants depletion affects chromatin structure in a variant-specific manner. Concretely, H1.2 knock-down, either alone or combined, triggers a global chromatin decompaction. Overall, imaging has allowed us to distinguish H1 variants distribution beyond the segregation in two groups denoted by previous ChIP-Seq determinations. Our results support H1 variants heterogeneity and suggest that variant-specific functionality can be shared between different cell types.