Integrated evaluation of telomerase activation and telomere maintenance across cancer cell lines
Abstract
In cancer, telomere maintenance is critical for the development of replicative immortality. Using genome sequences from the Cancer Cell Line Encyclopedia and Genomics of Drug Sensitivity in Cancer Project, we calculated telomere content across 1299 cancer cell lines. We find that telomerase reverse transcriptase (TERT) expression correlates with telomere content in lung, central nervous system, and leukemia cell lines. Using CRISPR/Cas9 screening data, we show that lower telomeric content is associated with dependency of CST telomere maintenance genes. Increased dependencies of shelterin members are associated with wild-type TP53 status. Investigating the epigenetic regulation of TERT, we find widespread allele-specific expression in promoter-wildtype contexts. TERT promoter-mutant cell lines exhibit hypomethylation at PRC2-repressed regions, suggesting a cooperative global epigenetic state in the reactivation of telomerase. By incorporating telomere content with genomic features across comprehensively characterized cell lines, we provide further insights into the role of telomere regulation in cancer immortality.
Introduction
Telomeres, repetitive nucleoprotein complexes located at chromosomal ends, are an important component of genomic stability (de Lange, 2009). As protective chromosomal caps, telomeres prevent potentially lethal end-fusion events (McClintock, 1941; McClintock, 1942) and mis-processing of chromosomal ends as damaged sites by the DNA repair machinery (de Lange, 2005; Verdun and Karlseder, 2007). Due to factors such as incomplete DNA replication and oxidative stress, telomeres gradually shorten with successive rounds of cell division (Olovnikov, 1973). If left unchecked, telomere attrition eventually triggers growth arrest and senescence, and further shortening can lead to acute chromosomal breakage and cell death. Unrestricted telomere shortening therefore acts as a major obstacle in the course of tumor development (Hackett and Greider, 2002; Lorbeer and Hockemeyer, 2020), and inhibition of telomere maintenance offers still largely untapped opportunities for targeted cancer therapies (Damm et al., 2001; Dikmen et al., 2005; Flynn et al., 2015).
Telomere shortening in embryonic development and in certain adult cell populations is offset by telomerase (Greider and Blackburn, 1985), a ribonucleoprotein enzyme with a core reverse transcriptase, TERT, that lengthens telomeres by catalyzing the addition of TTAGGG nucleotide repeats from an inbuilt RNA template component, TERC (Feng et al., 1995; Yu et al., 1990). Although telomerase is transcriptionally silenced in the majority of somatic cells, telomerase is reactivated in over 85% of all human cancers (Kim et al., 1994). Despite having activated telomere maintenance mechanisms, most cancers tend to have shorter telomeres than normal tissues, perhaps due to telomere maintenance mechanisms developing only after a critical state of telomere crisis has been reached (Okamoto and Seimiya, 2019). Reactivation of telomerase is associated with a diverse set of genomic alterations, the most common of which include highly recurrent mutations in the TERT promoter (Horn et al., 2013; Huang et al., 2013), aberrant methylation (Lee et al., 2019) or copy number amplification (Zhang et al., 2000) of TERT, and modulation of the numerous transcription factors that regulate TERT expression (Greider, 2012; Wu et al., 1999). Of the minority of cancers that do not reactivate telomerase, many depend upon alternative lengthening of telomeres (ALT), a process that exploits mechanisms of homologous recombination and is characterized by heterogeneous telomere lengths, mutations in the ATRX and DAXX chromatin-regulating factors, and genome instability (Cesare and Reddel, 2010).
The readily identifiable nature of telomeric DNA repeats has motivated the development of computational methods for the determination of telomere content from whole-genome sequencing (WGS) and whole-exome sequencing (WES) data (UK10K Consortium et al., 2014). Recently, such methods were employed to characterize telomere content across tumor sequencing data from panels such as The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression (GTEx) project, and the Pan-Cancer Analysis of Whole Genomes (PCAWG) study, which have identified genomic markers of relative telomere lengthening and maintenance mechanisms (Barthel et al., 2017; Castel et al., 2019; PCAWG-Structural Variation Working Group et al., 2020). To gain a greater functional understanding of the landscape of telomere maintenance in cancer, we estimated telomeric DNA content (subsequently referred to as telomere content Feuerbach et al., 2019) across a diverse array of human cancer cell lines profiled in the Cancer Cell Line Encyclopedia (CCLE) (Barretina et al., 2012; Ghandi et al., 2019) and Genomics of Drug Sensitivity in Cancer (GDSC) (Yang et al., 2013) projects. Although cancer cell lines are immortalized, often through TERT activation, cell lines can serve as models for living tissues in aspects such as gene expression and have been deeply profiled (Barretina et al., 2012; Ghandi et al., 2019). We hypothesized that assessing telomere content could reflect underlying mechanisms of attrition, maintenance, and repair, which may be reflected in associations with genetic markers. By combining these estimates with a rich set of existing CCLE annotations, we aimed to determine genetic, epigenetic, and functional markers of telomere content and telomerase activity across a diverse panel of human cancer cell lines.
Results
Telomere content across cancer cell lines
Telomeric reads can be identified in DNA-sequencing reads using the canonical tandemly repeated TTAGGG motif, and normalized telomeric read counts may provide an accurate estimate of telomere content (UK10K Consortium et al., 2014). We quantified telomere content across cell lines using WGS and WES data from the independent CCLE (Barretina et al., 2012; Ghandi et al., 2019) and GDSC (Yang et al., 2013) datasets. In particular, we considered 329 cell lines profiled with WGS and 326 with WES in the CCLE, and 1056 samples profiled with WES in the GDSC, of which 55 were non-cancerous matched-normal samples. We note that our estimates state telomeric DNA repeat tract content, which is a normalized measure of telomeric reads in a sample, rather than solely telomere length, because telomere length requires the identification of true telomeric DNA from intrachromosomal, non-terminal telomeric DNA repeat tracts and extrachromosomal telomeric DNA (Feuerbach et al., 2019). To assess the fidelity of our telomere content measurements, we examined the agreement between the telomere content estimates in overlapping cell lines from independent sequencing datasets (Figure 1—figure supplements 1 and 2a). We observed high agreement between the telomere content estimates derived from CCLE WGS and GDSC WES data (r = 0.84, p = 3.7×10−79, n = 286) and moderate agreement between CCLE WGS and CCLE WES estimates (r = 0.71, p = 1.5×10−6, n = 36), suggesting that our length estimates were not strongly biased by source. Among the three datasets used, CCLE WGS samples each captured at least ten thousand telomeric reads (which we estimated as those with six or more TTAGGG repeats), followed by GDSC WES samples at 100–1000 telomeric reads, and lastly by CCLE WES samples with 10–100 telomeric reads (Figure 1—figure supplement 2b). Moreover, one cell line (HEL 92.1.7) was twice-sequenced in the CCLE WGS dataset and these replicates had similar raw telomere length estimates (4.00 and 4.06 kilobases). Based on the agreement between CCLE WGS and GDSC WES, we generated a merged telomere content dataset of 1099 cell lines (Supplementary Methods) by combining the normalized log-transformed telomere contents derived from the CCLE WGS and GDSC WES datasets for downstream analyses.
Given these telomere content estimates, we first sought to examine the general distribution across cell lines and with respect to key cell line attributes such as donor age and population doubling rate. The overall distribution of telomere content displayed a slight skew (Figure 1—figure supplement 2a) toward longer telomeres, perhaps reflective of cell lines dependent upon ALT, a hallmark of which is telomeres of abnormal and heterogeneous lengths (Bryan et al., 1997; Heaphy et al., 2011b). We matched a substantial number of cell lines (282 for CCLE WGS, 554 for GDSC WES) with the age of the donor at the time of removal, from which we observed weak negative (vs. CCLE WGS: r = −0.05, p = 0.39; vs. GDSC WES: r = −0.17, p = 6.0×10−5) correlations between telomere content and the age of the original donor (Figure 1—figure supplement 3a). Similarly, we also found TC to be negatively correlated with the log CCLE-calculated cell line doubling rate (vs. CCLE WGS: r = −0.17, p = 0.01; vs. GDSC WES: r = −0.11, p = 0.03) (Figure 1—figure supplement 3b), consistent with shorter telomeres in cell lines modestly associated with higher doubling rates. Among 1099 merged CCLE WGS and GDSC WES samples, we found raw telomere content to vary substantially both between (p = 2.0×10−15, Kruskal-Wallis H test) and within (Figure 1) cell lines of different primary sites (Figure 1—figure supplement 3c,d). Cell lines of hematopoietic origin (namely leukemias and lymphomas, which comprised 156 lines) tended to have higher telomere contents on average (p = 2.0×10−8, two-sided Mann-Whitney U test), perhaps due to their elevated levels of telomerase expression, which were the highest among all subtypes. The 55 non-cancerous samples profiled as part of the GDSC displayed relatively high telomere contents (p = 5.9×10−4, two-sided Mann-Whitney U test, Figure 1), consistent with previous reports of widespread telomere shortening in cancer (Barthel et al., 2017). The greatest median telomere content, however, was found across lymphocyte and blood cell lines (Figure 1). The cell line with the highest telomere content was the U2-OS osteosarcoma line, a well-characterized model for ALT (Bryan et al., 1997).

Telomere content and related genomic features across human cell lines.
Cell lines were grouped by cancer type and ordered by telomere content within each type, and are displayed such that each column represents a cell line. Telomere content measurements reflect combined z-scored estimates derived from CCLE WGS and GDSC WES with means for samples with telomere content estimates from both sources. Bars within each cancer type represent medians. Relative copy number values are shown as log2(relative to ploidy + 1)–1. Cell lines shown are filtered such in addition to annotations for telomere content, values for TERT and TERC RNA-seq expression, TERT and TERC copy number, and ATRX and DAXX mutation status are all available (with an exception made for non-cancerous cell lines, which lack such profiling in DepMap). Cell lines were also filtered such that each cancer type is represented by at least 10 cell lines (n = 738 cell lines total). RNA expression estimates are in terms of log2(TPM+1). CNS: central nervous system; PNS, peripheral nervous system; UADT, upper aerodigestive tract.
Genomic alterations associated with telomere content
Having determined telomere content in the context of cell line meta-attributes, we next sought to leverage the substantial genomic profiles of the CCLE to find correlates of telomere content. To characterize the genomic signatures of telomere content, we correlated the CCLE WGS and GDSC WES telomere content estimates against molecular annotations in the CCLE, with a focus on alterations known to be associated with telomere maintenance. First, we observed that telomere content and TERT overall and isoform-specific mRNA levels were positively though weakly correlated as determined by RNA sequencing (RNA-seq) both within each subtype (Figure 1—figure supplement 4a,b) and overall (Figure 1—figure supplement 4c,d). In specific cancer types such as CNS, lung, and leukemia, we found a higher correlation between TERT mRNA expression and telomere content (Figure 1—figure supplement 4a,b). Furthermore, we found negative associations between TERT mRNA levels and telomere content in bone and peripheral nervous system cell lines (Figure 1—figure supplement 4a,b). Because ALT is most commonly found in these cancer types, this may be a consequence of the near-mutual exclusivity between TERT expression and markers of ALT (Killela et al., 2013; Lee et al., 2018). Although mutations in ATRX and DAXX are closely associated with the development of the ALT phenotype (Brosnan-Cashman et al., 2018; Clynes et al., 2015; Heaphy et al., 2011a; ALT Starr Cancer Consortium et al., 2012; Ramamoorthy and Smith, 2015), comparisons between telomere content and mutations in ATRX and DAXX yielded significant associations only between DAXX alterations and merged telomere content (Figure 1—figure supplement 5a). We further repeated association tests with TP53, VHL, and IDH1 as identified previously among TCGA samples (Barthel et al., 2017) and confirm that truncating VHL mutations are associated with reduced telomere lengths (Figure 1—figure supplement 5a), although this may be confounded by the high occurrence of VHL mutations in kidney cell lines. Testing telomere content association with molecular features profiled in the CCLE, we identified several genes known to be associated with telomere biology, suggesting that our integration of telomere content with existing annotations could identify features relevant to telomere maintenance mechanisms (Figure 1—figure supplement 5b). While we found relatively few significant associations between telomere content and mutations, we note that we were limited to an absolute estimate of telomere content as opposed to a relative measure of somatic telomere lengthening, which requires a paired normal sample (Barthel et al., 2017).
Telomere content associates with CST complex dependencies
Given the associations between telomere content and several genomic and transcriptomic features, we next considered whether variations in telomere content could confer or reduce selective vulnerabilities to inactivation of certain genes. In particular, we hypothesized that telomere content may be associated with vulnerabilities to reductions in the levels of telomere-regulating proteins. These vulnerabilities may be measured using CRISPR- and RNAi-based depletion assays, which can be processed to yield a numerical value for the dependency of a gene within a cell line, with more negative values indicative of increased dependence on a particular gene. To reveal such associations, we correlated our telomere content estimates with gene inactivation sensitivities assessed via genome-wide CRISPR-Cas9 (Avana; Meyers et al., 2017) and RNAi viability screens (Achilles RNAi [Tsherniak et al., 2017] and DRIVE [McDonald et al., 2017]). Although we found no dependencies that displayed outlier associations with telomere content in the Achilles RNAi screen (Figure 1—figure supplement 5b), we discovered that sensitivity to Avana CRISPR-Cas9 knockouts of each of the three CST complex proteins as well as the telomere-associating protein TERF1 were outlier associations with telomere content estimates computed from both the GDSC WES and CCLE WGS data (Figure 2a,b). Specifically, sensitivity to knockout of the CST complex components (CTC1, STN1, TEN1), which are key mediators of telomere capping and elongation termination (Chen et al., 2012), was correlated with lower telomere content (Figure 2—figure supplement 1a,b). Although the CST complex was not assessed in the DRIVE screening dataset, we again found TERF1, a key shelterin component, to be among the positively correlated genes with telomere content in the DRIVE panel (Supplementary file 2). Overall, these data suggest that cancer cell lines with shorter telomeres are more susceptible to inhibition of the CST-mediated telomere maintenance mechanism.

Telomere-binding protein dependencies are associated with telomere content and TP53 mutation status.
(a) Pairwise plot of Pearson correlations between dependencies of all genes in the Avana dataset and CCLE WGS telomere content (x-axis, n = 210–211 cell lines) and GDSC WES telomere content (y-axis, n = 420–426 cell lines) estimates. (b) Pairwise plot of significance levels of correlations shown in (a) with correction for multiple hypothesis testing. (c) Pairwise Pearson correlation matrix between Avana dependencies among CST members and five shelterin components (n = 796–808 cell lines; Supplementary file 3). (d) Associations of CST and shelterin member Avana dependency scores with damaging and hotspot mutations (n = 796–808 cell lines). For each gene dependency, mutation associations were computed using rank-biserial correlations with mutants and wild-types as the two categories. p Values determined using two-sided Mann-Whitney U test. (e) Associations of shelterin member DRIVE dependency scores with damaging and hotspot mutations (n = 372–375 cell lines; Supplementary file 3) under the same scheme used in (d). (f) Network schematic of the co-dependency matrix shown in (c) and annotated with association with telomere content or TP53 mutation status.
Using the CST complex as a seed set, we subsequently queried all dependencies under the premise that associated gene dependencies reflect coordinated functions (Pan et al., 2018). Within the Avana panel (n = 757–769), we found significant (FDR < 0.01) outlier associations between the CST complex genes and genes encoding six additional telomere-associating proteins (ACD, POT1, TERF1, TERF2, TINF2, and DCLRE1B). These first five additional telomere-associating proteins, together with TERF2IP, comprise the shelterin complex, the protector and regulator of telomere length and topology (de Lange, 2005). Interestingly, whereas the five other shelterin dependencies were positively associated with telomere content, TERF2IP displayed a weak negative association (Figure 2—figure supplement 1c), suggesting that TERF2IP may play a distinct regulatory role in shelterin function compared to the other members. To examine the dependency landscape of the CST complex and these six other telomere-related proteins, we computed a correlation matrix involving these nine genes, clustering of which yielded two main subgroups: one comprised of the CST complex members, and another of the six other genes (Figure 2c). Despite this separation, POT1 and TINF2 also displayed notable correlations with CST dependencies, possibly serving as the primary mediators of previously-reported functional interactions between the shelterin and CST complexes (Chen et al., 2012; Wan et al., 2009).
Although we found strong codependency relationships within this group of telomere-associated proteins, we also observed that certain shelterin members displayed notable codependencies with p53 pathway members such as MDM2, ATM, and TP53 itself (Figure 2—figure supplement 2a,b). Because sensitivity to perturbation of the p53 pathway is highly associated with TP53 mutations in cancer (McDonald et al., 2017), we asked if these codependency relationships were also associated with hotspot mutations in TP53. In fact, TP53 was a significant (FDR < 0.001) outlier when a comprehensive set of hotspot and damaging mutations was compared against sensitivity to ACD and TERF1 dependencies in the Avana panel (Figure 2d, Figure 2—figure supplement 2c) and against TERF1 and TINF2 dependencies in the DRIVE panel (Figure 2e, Figure 2—figure supplement 2d). These links between these gene dependencies and TP53 mutation status reprise and extend previous reports of p53-dependent DNA damage responses to TERF1 and TINF2 depletion (Pereboeva et al., 2016; Rosenfeld et al., 2009). Taken together, we find that CST and shelterin dependencies are correlated with each other, telomere content, and TP53 mutation status (Figure 2f).
Patterns and mechanisms of telomerase expression
Having thoroughly characterized telomere content and its related dependencies across the CCLE, we next focused on the regulation of TERT transcription itself. Across 1019 samples previously profiled with deep RNAseq, we found that hematopoietic cell lines (leukemias, lymphomas, and myelomas) were associated with the greatest mean expression of TERT (p = 5.8×10−26, two-sided Mann-Whitney U test; Figure 1). In contrast, TERT expression was significantly reduced (p = 2.0×10−22, two-sided Mann-Whitney U test) and generally undetectable in fibroblast-like cell lines. With regard to the telomerase RNA component (TERC), we found strong associations between TERC RNA expression and RNA levels of several small Cajal body-specific RNAs (scaRNAs) and histone subunit RNAs (Figure 1—figure supplement 4e). The co-expression of TERC and these other RNAs may be a consequence of their shared localization, processing, and regulation in Cajal bodies (Gall, 2003; Venteicher et al., 2009; Zhu et al., 2004), as TERC itself contains an H/ACA box small nucleolar RNA (snoRNA) domain (Mitchell et al., 1999) and is a scaRNA family member. Moreover, these associations suggest that variations in Cajal body processing may act as factors in TERC reactivation (Cao et al., 2008). Although we focused on the transcriptional features of telomerase, it is important to note that other factors in addition to TERT and TERC expression determine telomerase activity and the eventual maintenance of telomere length (Listerman et al., 2013). Despite these orthogonal factors, TERT enzymatic activity remains strongly correlated with raw levels of TERT expression (Takakura et al., 1998).
Beyond raw gene and transcript expression levels, we next examined the underlying mechanisms for reactivating TERT in cancer using the comprehensive cell line data. Widespread transcriptional reactivation of TERT in cancer is driven by a variety of factors. Aside from copy number amplifications (Zhang et al., 2000), highly recurrent mutations in the TERT promoter drive strong monoallelic TERT re-expression (Bell et al., 2015; Huang et al., 2015). To explore the intersections between methylation, promoter mutations, and allele-specific expression, we combined available profiling data from the CCLE (Figure 3a). First, using WGS and targeted sequencing of the TERT promoter provided in the CCLE, we assessed TERT promoter status for 503 cell lines across 21 cancer types (Figure 3—figure supplement 1a). We found that only the C228T (chr5:1,295,228 C>T) mutation was significantly (p = 2.8×10−5, two-sided Mann-Whitney U test) associated with an increase in TERT expression (Figure 3—figure supplement 1b). Surprisingly, the mean level of TERT expression in monoallelic contexts was only slightly lower than that of biallelic contexts, with less than a 1.5-fold difference between the groups (p = 0.03, two-sided Mann-Whitney U test). Given that cells with biallelic TERT expression presumably express TERT with twice the transcriptional source sites as those with monoallelic TERT expression, this reduced difference may be a consequence of the effects of the TERT promoter mutation in producing particularly robust monoallelic expression (Huang et al., 2013) or expression of TERT from multiple sites all of the same allele (Rowland et al., 2019).

Allele-specific methylation of the TERT locus is indicative of both promoter mutation status and allele-specific expression.
(a) Heatmap of CpG methylation levels along the TERT locus, sorted in order of mean methylation levels along the upstream 5 kb region within TERTp-mutants and -wildtypes. TERT gene expression levels are also indicated for each cell line. Each column represents a cell line (n = 450), and each row represents a CpG pair (n = 209) sorted from the 5’ to 3’ direction along the TERT sense strand. White blocks indicate missing ASM/methylation values. Cell lines with unavailable ASM values for at least half of TERT locus CpGs were excluded. (b) ASM levels of TERT locus subregions in cell lines are indicative of TERTp status and allele-specific expression. BAE, biallelic expression; MAE, monoallelic expression. Boxes, interquartile range (IQR); center lines, median; whiskers, maximum and minimum or 1.5 × IQR; notches, 95% confidence interval of bootstrapped median using 1000 samples and a Gaussian-based asymptotic approximation. *p < 0.05, **p < 0.01, n.s, not significant; two-sided Mann-Whitney U test.
To further explore allele-specific expression (ASE) patterns of TERT, we employed an ASE-calling pipeline (Supplementary Methods) and determined TERT allele-specific expression status for 157 cell lines (Figure 3—figure supplement 1c), an increase of 69 cell lines compared to a previous report using CCLE WGS data (Huang et al., 2015). Out of these 157 cell lines, 87 (58.6%) express TERT from a single allele. Moreover, of these 157 cell lines, 129 have a sequenced promoter, with which we confirm that promoter mutations unanimously drive monoallelic expression (Figure 3—figure supplement 2a). Our expanded set of cell lines also reveals several new tissues of origin in which TERT is monoallelically expressed without a mutant promoter, such as hematopoietic cell lines (Figure 3—figure supplement 2b). This high proportion of TERT monoallelic expression led us to ask whether there are genomic alterations aside from promoter mutations that could lead to ASE. Under the assumption that such alterations may also induce ASE in a larger region than a promoter mutation, we determined ASE status in SLC6A19 and CLPTM1L, which are the most immediate neighbor genes of TERT. Because the number of samples with annotated ASE in both TERT and these neighbors was not large enough for comparisons of overlapping ASE, we instead examined the individual frequencies of ASE among these three genes. Compared to promoter-wildtype monoallelic TERT expression occurring in 36% (47 of 129) of samples, only 9.4% (11 out of 117) of samples expressed CLPTM1L from a single allele and 22% (5 of 22) expressed SLC6A19 from a single allele, and we were unable to assess co-occurrence due to lack of overlap in samples with heterozygous SNPs in both TERT and CLPTM1L or SLC6A19. Furthermore, a search for structural variants in the surrounding 100 kilobase regions yielded no significant associations. However, given that only 106 cell lines had both a known TERT ASE status and the required WGS data for structural variant determination, more genomic annotations may be needed for the discovery of additional mechanisms driving the monoallelic expression of TERT.
Distinct TERTp methylation patterns at the TERT locus
Aside from monoallelic expression, TERT promoter mutations are characterized by unique patterns of epigenetic marks, namely allele-specific CpG methylation (ASM) and H3K27me3 repressive histone modifications (Stern et al., 2015; Stern et al., 2017) and long-range chromatin interactions (Akıncılar et al., 2016). Using genome-wide RRBS data across 928 cell lines, we elucidated associations between CpG-site methylation of the TERT locus (namely, TERT and the surrounding 5 kb) and TERT promoter mutations. Examination of methylation patterns at the TERT locus revealed five prominent ASM clusters in the TERT locus, corresponding roughly to the upstream 5 kb region (containing the promoter), part of a CpG island overlapping the first exon, two other parts of this CpG island, and the remaining gene body (Figure 3a). Comparison of each region’s mean ASM against TERTp mutant status revealed that TERTp mutants exhibited strong and significant (p < 0.01) increases in ASM in the promoter (n = 485), remaining gene body (n = 493), and exon 1 (n = 478) regions (Figure 3b, Figure 3—figure supplement 2c). In contrast, TERTp wild-type cell lines tended to lack ASM throughout the TERT locus, instead being hypermethylated in all regions except for exon 1 (Figure 3b, Figure 3—figure supplement 2d), and partial hypomethylation in promoter mutants may reflect the hemizygous methylation previously observed at the TERT locus (Rowland et al., 2020; Stern et al., 2015; Stern et al., 2017). Absolute methylation of the remaining gene body was positively correlated with TERT expression in both promoter status contexts (Figure 3—figure supplement 1e), which parallels previous reports of a positive correlation between TERT expression and methylation (Barthel et al., 2017; Salgado et al., 2019). Exon 1 methylation was elevated in nearly all cell lines with monoallelic TERT expression in both the mutant promoter context (p = 6.3×10−3, two-sided Mann-Whitney U test, n = 81) and the wildtype promoter context (p = 1.6×10−4, two-sided Mann-Whitney U test, n = 98) compared to biallelic TERT expressors (Stern et al., 2017). Interestingly, although most cell lines with monoallelic TERT expression displayed partially elevated methylation levels in exon 1 (Figure 3b), only promoter-mutant cell lines were hypomethylated in the surrounding regions, suggesting that the epigenetic state of promoter mutants is in fact distinct from that of promoter-wildtype monoallelic TERT expressors.
A genome-wide epigenetic pattern in TERTp mutants
The observation that TERT promoter mutants display a hypomethylated TERT locus even compared to other monoallelic TERT expressors led us to ask if additional epigenetic signals are indicative of TERT promoter status. In particular, we considered the possibility that epigenetic changes to the TERT locus could in fact act as a cooperative factor (Kim et al., 2016; Kim and Shay, 2018) in tumorigenesis or tumor cell maintenance rather than as a consequence of TERT promoter mutations. To address this hypothesis, we performed a genome-wide search for CpG islands (CGIs) with significant differences in methylation levels in TERTp mutant cell lines compared to TERTp wild-type ones. If TERT hypomethylation were a downstream consequence of TERT promoter mutations then we would expect TERT hypomethylation to be an isolated event, and thus there would be few CGIs outside the vicinity of TERT with methylation levels correlated with TERTp mutant status. Surprisingly, we instead found a broad genome-wide distribution of CGIs that were hypomethylated in TERTpmut samples relative to TERTpWT samples (Figure 4a). Moreover, when correlated with a panel of global histone modification levels, we found that TERTp mutants exhibited increased levels of H3K9ac1K14ac0 and H3K9ac1K14ac1 marks (Figure 4b), which have been suggested as marks of transcriptionally active chromatin (Ruthenburg et al., 2007). Likewise, when H3K9ac1K14ac0 levels were compared against a genome-wide panel of CGI ASM levels, the TERT CGI (chr5:1,289,275–1,295,970) was the top correlate (Figure 4c). H3K9ac1K14ac0 levels were significantly increased in TERTp mutants compared to monoallelic TERTp wild-type cell lines, linking this histone modification to TERTp mutation (Figure 4d).

TERT promoter mutations associate with genome-wide decreased methylation of PRC2-repressed regions.
(a) Pairwise plot of median CGI methylation levels in TERTpmut cell lines (n = 21–83; Supplementary file 6) versus TERTpWT cell lines (n = 95–410, Supplementary file 6). Each dot represents a CGI. (b) Rank-biserial correlations between TERTp status (mutant or wild-type) and global histone modification levels (n = 302–475). Significance determined by two-sided Mann-Whitney U test. (c) Pearson correlation levels between global H3K9ac1K14ac0 levels and ASM imbalance of CGIs (n = 261–884). (d) H3K9ac1K14ac0 levels are significantly increased in TERTp mutants. Boxes, interquartile range (IQR); center lines, median; whiskers, maximum and minimum or 1.5 × IQR; notches, 95% confidence interval of bootstrapped median using 1000 samples and a Gaussian-based asymptotic approximation. *p < 0.01, n.s, not significant; two-sided Mann-Whitney U test. (e) LOLA core set enrichment analysis of CGIs hypomethylated in TERTpmut cell lines reveals enrichment of PRC2-repressed regions. (f) Kernel density distributions of rank-biserial correlations between CGI methylation levels for PRC2-overlapping regions and non-PRC2-overlapping regions. A negative correlation indicates that a CGI is hypomethylated in TERTpmut cell lines relative to TERTpWT ones, and a positive correlation indicates the opposite. PRC2 regions were sourced from the HepG2 segmentation. (g) LOLA ENCODE Roadmap region enrichment analysis of CGIs hypomethylated in TERTpmut cell lines reveals enrichment of H3K9me3 and H3K27me3 regions.
To better understand the distribution of these TERTpmut-hypomethylated CGIs, we utilized Locus Overlap Analysis (LOLA) (Nagraj et al., 2018; Sheffield and Bock, 2016) to query the significance of overlaps between these CGIs and predetermined region sets. Among the top 1000 TERTpmut-hypomethylated CGIs (Figure 4a), we found significant (FDR < 0.0001) and robust 10-fold enrichment for polycomb repressive complex 2 (PRC2)-repressed regions (Figure 4e) previously characterized in several cell lines (HepG2, GM12878, HeLa-S3, K562, and HUVEC). Beyond these top 1000 hypomethylated CGIs, CGIs overlapping with PRC2-repressed segments were broadly hypomethylated in TERTpmut cell lines and accounted for nearly all the previously observed skew toward hypomethylation (Figure 4f). Interestingly, the enrichment of PRC2 segments was much smaller (around 3.5-fold) in the remaining profiled cell line, H1-hESC. Against ENCODE ChIP-seq peak region sets, we also found significant overlap with the H3K9me3 and H3K27me3 heterochromatin marks (Figure 4g). Furthermore, we also observed a moderate twofold (P = 5.1×10−24, Fisher’s exact test; Figure 4—figure supplement 1a) enrichment for regions within 10 megabases of most telomeres, consistent with previous reports that PRC2-repressed and H3K27me3-marked regions are enriched in telomeric and subtelomeric regions (Rosenfeld et al., 2009). The enrichment of these hypomethylated regions among telomere-proximal regions may also be indicative of a recently-reported telomere position effect, which has been shown to affect the chromatin accessibility of the TERT locus (Kim et al., 2016) located close to the chromosome 5 p telomere. Given that PRC2 has previously been shown to exhibit allele-specific binding to the methylated silent allele in TERTpmut cell lines (Stern et al., 2017), this genome-wide pattern of hypomethylation at PRC2 sites suggests that background epigenetic events may interact with promoter mutations in facilitating TERT expression.
Given this peculiar pattern of hypomethylation at telomere-proximal sites of PRC2 repression across CCLE samples, we next asked if a similar pattern exists across TCGA primary tumor samples. To test this hypothesis, we estimated CGI methylation levels across 878 TERT-expressing TCGA samples characterized with both the Illumina 450 k array and with a previously determined TERT promoter status (Barthel et al., 2017). Consistent with previous analyses of TERT methylation levels at the cg11625005 methylation probe, we find that TERTpmut samples tended to exhibit hypomethylation (Figure 4—figure supplement 1b,c). Among 13,547 CGIs, we again found an enrichment of hypomethylation of PRC2-overlapping CGIs (Figure 4—figure supplement 1d), although this was less prominent than previously noted in the CCLE. LOLA enrichment analysis for TERTpmut-hypomethylated CGIs in the TCGA likewise confirmed significant enrichments (FDR < 0.0001) of PRC2-repressed regions and associated histone modifications as the top enriched region sets (Figure 4—figure supplement 1e,f). However, the fold-enrichment was less (about five-fold) than that observed in the CCLE and did not display any significant enrichment in telomere-proximal regions (p = 0.70, Fisher’s exact test). Although the skew toward hypomethylation in TERT promoter mutants among these TCGA samples was weaker than in the CCLE samples, this may be the result of the more heterogeneous nature of primary TCGA samples as well as the differences in coverage between the Illumina 450 k array and RRBS.
Discussion
To investigate the nature of telomeres and their maintenance mechanisms in cancer, we applied a functional genomics approach toward understanding molecular relationships across cancer cell lines. We estimated relative telomere content across over a thousand cancer cell lines and thus provide a useful reference for further studies on cancer cell line characteristics that have not to date considered this feature. We show that cell line telomere content indeed varies with factors such as tissue type, TERT mRNA expression, and mutations in genes such as DAXX and VHL. Moreover, we discovered novel relationships between telomere content and dependencies of CST and shelterin complex members, which was enabled by the high overlap between the cell lines profiled by our estimates and by several loss-of-function screens (McDonald et al., 2017; Meyers et al., 2017; Tsherniak et al., 2017).
Using these genome-wide gene dependency estimates, we found that increased sensitivity (Meyers et al., 2017) to depletion of CST complex members correlates with shorter telomeres, likely a consequence of the critical roles of the CST complex in both telomere protection and in terminating telomere elongation (Chen et al., 2012). Our findings raise the possibility that targeting the CST complex may preferentially affect cancer cells that harbor shorter telomeres, and telomere content may be used as a biomarker of drug response in tumors. Likewise, CST complex dependencies were positively associated with the dependencies of several shelterin complex components, reflecting their functional interactions. Among these shelterin complex members, we find that the responses to their depletion are highly dependent upon the presence of a wild-type TP53 gene, with TP53 mutants displaying reduced sensitivity to depletion of ACD, TERF1, and TINF2. Additional studies are required to validate these associations and to assess why only certain members of the shelterin complex show this TP53-dependent sensitivity effect.
In addition to telomere content, we also investigated the genomic landscape of telomere maintenance mechanisms, namely mechanisms of TERT reactivation, across cancer cell lines. The enrichment of TERT promoter mutations in certain tissues has inspired several explanations, and our findings in both the CCLE and TCGA suggest a specific epigenetic signature that may underlie this unique pathway of telomere maintenance. We found that in TERT promoter mutants, CpG islands were preferentially hypomethylated in PRC2-repressed regions located near telomeres, which may relate to previous reports of a long-range telomere position effect (Kim et al., 2016; Yuan et al., 2019) and of TERT expression necessitating specific chromatin states in promoter-wildtype and mutant samples (Salgado et al., 2019). Considering that normal tissues typically exhibit particularly low methylation of the TERT promoter (Salgado et al., 2019; Stern et al., 2017) and that PRC2 occupies the inactive allele in TERT promoter mutants (Stern et al., 2017), our genome-wide signature may relate to the latter part of the two-step mechanism proposed for TERTp mutation-driven telomerase upregulation (Chiba et al., 2017). Moreover, epigenetic mechanisms have been shown to produce synergistic effects with driver mutations in tumor evolution (Tao et al., 2019). Besides reflecting a direct cooperation with TERT expression, this signature raises the possibility that the ‘memory’ of short telomeres may be preserved through these telomere-proximal hypomethylated regions. It may also be indicative of the stemness of cell lines, which has been proposed as a major factor in the proliferative advantage of TERT promoter mutations (Chiba et al., 2015). Future studies will be necessary to elucidate the nature of this epigenomic signature, how it impacts the regulation of telomerase expression, and the complexities of TERT expression beyond binary measures of allele-specificity (Rowland et al., 2019). Furthermore, incorporation of telomere content into studies using cancer cell lines may help improve our understanding of sensitivities to drugs or genetic perturbations across cell lines.
Through our analysis, we show relevant markers of telomere-associated protein function, patterns of TERT reactivation across cancers, and epigenetic determinants of TERT promoter status. We detail various features of telomere regulation and dysfunction in cancer, and we provide a substantial addition of new features to a well-characterized set of cell lines. By doing so, we complement molecular studies of telomeres in parallel studies across the GTEx (GTEx Consortium et al., 2020), TCGA (Barthel et al., 2017), and PCAWG (PCAWG-Structural Variation Working Group et al., 2020) panels, providing a resource that will guide additional studies on the roles and functions of telomeres in cancer.
Materials and methods
Telomere content estimation
Request a detailed protocolTelomere content estimates were computed using Telseq (UK10K Consortium et al., 2014) with the default settings. Telseq records the frequencies of reads containing various frequencies of the canonical TTAGGG telomeric repeat, and then normalizes this number of telomeric repeats using a GC-adjusted coverage estimate and the average chromosome length.
Telomere content was estimated for WGS and WES samples in the CCLE (Ghandi et al., 2019) as well as WES samples in the GSDC (Yang et al., 2013) using the default settings. When multiple read groups were present in a sample, telomere content was computed as a mean of the individual read group estimates weighted by the total read count per group. Whereas we found decent agreement between overlapping samples in CCLE WGS and GDSC WES, we found a comparatively weak correlation between both sets and the CCLE WES estimates (Figure 1—figure supplement 2). Therefore, we excluded the CCLE WES telomere content estimates from subsequent analyses.
In comparing the CCLE WGS and GDSC WES estimates, we also noticed a batch effect resulting in two clusters of GDSC WES estimates. To identify and correct this batch effect across all GDSC WES estimates, we observed that these batches were distinguished by frequencies of reads containing exactly 4, 5, and 6 telomeric motifs. We then ran a k-means clustering on these read frequencies to estimate the clusters across all GDSC WES samples, which were subsequently adjusted by re-centering the mean of one cluster (after applying a z-scored log-transformation) to match the mean of the other.
We also attempted to use Telseq to estimate telomeric repeat-containing RNA (TERRA) expression across 1019 RNA-seq samples from the CCLE. However, because the majority of these samples were found to contain little or no reads containing telomeric reads, TERRA capture was determined to be too low for any meaningful analysis.
Cell lines were annotated with sample descriptors from the CCLE data portal (Cell_lines_annotations_20181226.txt, https://portals.broadinstitute.org/ccle/data). Harmonized sample information, telomere content estimates, and other matched annotations are available in Supplementary file 1.
Genomic and transcriptomic markers
Request a detailed protocolWe sourced mutations and copy number estimates from the DepMap download portal (https://depmap.org/portal/download/) under the public 19Q4 release (CCLE_mutations.csv and CCLE_gene_cn.csv, respectively). We used the mutation classifications detailed in the Variant_annotation column.
We also downloaded processed RNAseq estimates in the form of gene expression, transcript expression, and exon inclusion estimates from the CCLE data portal under the latest release (CCLE_RNAseq_rsem_genes_tpm_20180929.txt.gz, CCLE_RNAseq_rsem_transcripts_tpm_20180929.txt.gz, and CCLE_RNAseq_ExonUsageRatio_20180929.gct.gz, respectively). We also downloaded RPPA estimates (CCLE_RPPA_20181003.csv) and global chromatin profiling results (CCLE_GlobalChromatinProfiling_20181130.csv). Before performing subsequent analyses, we transformed transcript and gene expression TPMs by taking a log2-transform with a pseudocount of +1. We also excluded transcripts with a standard deviation in this log2(TPM + 1) measure of less than 0.25 across all cell lines. We excluded exons with missing inclusion values in over 800 cell lines or with a standard deviation of less than 0.1. Pearson correlations were then used to calculate associations between gene and transcript RNA expression levels of TERT and TERC against telomere content estimates as well as other markers. We also constructed linear models regressing merged telomere content as a function of various biomarkers with cell line primary site as a covariate. Moreover, we also calculated linear models of each of CCLE and GDSC telomere content as a function of TERT mRNA levels, TERC mRNA levels, primary site, TERTp mutant status, donor age, and calculated doubling time to examine the contributions of these specific factors. Out of these factors, we found TERT and TERC mRNA levels to be significantly correlated with both CCLE and GDSC telomere contents.
We also considered processed methylation estimates available on the CCLE data portal, namely the TSS 1 kb upstream estimates as well as the promoter CpG cluster estimates, which we correlated against telomere content estimates. For these annotations, we filtered out regions with a standard deviation of less than 0.05.
Results of TERT and TERC expression associations, as well as telomere content associations, are available in Supplementary file 2.
Gene dependency associations
Request a detailed protocolTo identify gene dependencies associated with telomere content, we considered knockout/knockdown effects estimated in the Avana CRISPR-Cas9 (Meyers et al., 2017), Achilles RNAi (Tsherniak et al., 2017), and DRIVE RNAi (McDonald et al., 2017) datasets. For Achilles and DRIVE, the datasets used were the April 2020 versions listed on the DepMap portal computed with DEMETER2. For Avana, we used gene effect scores from the February 2021 release. For each gene dependency in each dataset, we computed the Pearson correlation coefficient against telomere content estimates generated separately with CCLE WGS and GDSC WES data. Correlation p values were determined using the two-tailed Student’s t test. All correlation coefficients and p values were determined using the pearsonr function as part of the scipy.stats Python module.
To identify codependencies with the CST complex members, we employed an iterative approach to identify highly ranked correlations. In particular, starting with a seed set of genes (the base case of which was the CST complex), we searched for codependencies between two genes x and y under the criteria that the r2 association between the two is among the top five for x vs. all other genes, and among the top five for y vs. all other genes as well. We recursively applied this method four times, which added the five shelterin components ACD, POT1, TERF1, TERF2, and TINF2 to our gene set. To construct the clustered correlation matrix in Figure 2c, we used the clustermap function as provided by the Seaborn Python library, with Ward’s method for the determination of the hierarchical clustering.
To identify significant associations between dependencies and mutations, we compared dependencies against binary categories of damaging/truncating (comprised of deleterious alterations, such as nonsense and splice-site alterations) and hotspot (highly recurrent) mutations. Using the previously downloaded DepMap 19Q4 mutation annotations, we considered mutations as ‘damaging’/‘truncating’ if they were associated with a ‘damaging’ label under the Variant_annotation column, and we considered mutations as ‘hotspot’ if they were labeled as such in the corresponding COSMIC (Tate et al., 2019) (isCOSMIChotspot) or TCGA (isTCGAhotspot) columns. We excluded mutations with a total damaging or hotspot frequency of less than five across all profiled CCLE samples. Mutations were then compared with dependencies using a two-sided Mann-Whitney U test, with the two classes being non-damaging and non-hotspot mutant samples, and damaging and hotspot mutant samples, respectively.
To rank and visualize the codependencies shown in Figure 2—figure supplement 2a,b and the dependency-mutation associations shown in Figure 2d,e, we used a signed q-value approach. We first transformed the raw false discovery rates by taking the negative of the base-10 logarithm, and we then applied a sign to this transformed value as determined by the direction of the codependency (the sign of the correlation coefficient) or dependency-mutation association (negative for greater sensitivity in mutants, and positive otherwise).
Dependency analyses results are available in Supplementary file 3.
Characterization of allele-specific TERT expression
Request a detailed protocolAllele-specific expression may be detected by looking for discordant counts of reads mapping to single-nucleotide polymorphisms (SNPs) in DNA-sequencing vs. RNA-sequencing reads (Huang et al., 2015). In particular, allele-specific expression is evidenced by the biased frequency of a single allele of a heterozygous SNP in RNAseq reads compared to that of DNA-sequencing reads. To assess TERT expression in the context of allele-specificity, we examined cell lines for which DNA (WES or WGS) and RNA (RNAseq) sequencing data were available. To identify heterozygous anchor SNPs, we considered mutations in the TERT gene body called using Mutect 1.1.6 (Cibulskis et al., 2013) with default settings. We then applied a filter for mutations with at least eight reads supporting both the reference and alternate alleles that passed the Mutect quality control filter (i.e. classified as PASS). To force call the matching allele frequencies in RNA, we processed the matching aligned RNAseq reads using the ASEReadCounter tool provided in GATK 3.6 (Van der Auwera et al., 2013) with arguments -minDepth 8, --minBaseQuality 16, --minMappingQuality 255, and -U ALLOW_N_CIGAR_READS.
We then used these RNA and DNA allele frequencies to classify cell lines as monoallelic and biallelic expressors of TERT as well as two neighboring genes, SLC6A19 and CLPTM1L. In particular, we examined the odds ratio derived from a binary contingency table with the two sets of categories being the context (DNA vs. RNA) and the allele (reference vs. alternate) of the read counts. To account for edge cases where the denominator of the odds ratio was zero, we added a pseudocount of 0.5 to each category before computing the odds ratio. We then denoted MAE lines as those having an odds ratio computed using the major allele as the denominator of greater than five. In instances where there were multiple informative SNPs, we considered only the SNP with the greatest supporting total RNA-seq read count. In cases where the same SNP was detected across multiple sources (for instance, in both CCLE WES and WGS), we considered the source with the greatest coverage of the SNP.
Allele-specific calls for TERT, SLC6A19, and CLPTM1L are described in Supplementary file 4.
Genome-wide allele-specific methylation analysis
Request a detailed protocolTo characterize and compare CpG-level ASM around the TERT genomic region, we utilized RRBS data generated by the CCLE (Ghandi et al., 2019). Mapped BAM files were downloaded from the CCLE FireCloud workspace, and ASM levels for each CpG pair were estimated using the allelicmeth command from the MethPipe package (Song et al., 2013). Within each sample, we first included only CpG pairs with a minimum coverage of eight reads. Next, among all 928 cell lines, CpG pairs included in less than 5% of these samples were excluded.
To estimate ASM, we employed a strategy similar to the original MethPipe ASM pipeline. For each pair of CpGs, we considered the four combinations of methylation states between the two CpGs: methylated-methylated (mm), methylated-unmethylated (mu), unmethylated-unmethylated (uu), and unmethylated-unmethylated (mm). For semi-methylated CpG pairs not subject to ASM, we would expect high and relatively equal frequencies of the mu and um pairs, whereas for ASM CpG pairs, we would expect the allele bias to result in high mm and uu counts and low um and mu counts. To quantify this imbalance, we used the mean square contingency coefficient (Φ) with a pseudocount of 0.5. Namely, for each CpG pair, we computed
where , , , and . ASM CpG pairs therefore had a positive Φ, whereas non-ASM pairs had a Φ of around 0. We rounded negative Φ values to 0. Before computing these imbalance values, we excluded CpG pairs with a methylation level of less than 0.1 or greater than 0.9 on either CpG, so as to filter out CpG pairs that were likely to be fully methylated or completely demethylated.
We first examined the ASM levels of the TERT locus, which considered as the TERT gene body as well as the flanking five kilobase regions. For these methylation estimates, we excluded CpGs with less than 25% valid ASM estimates. We segmented these CpGs into five regions: the promoter (chr5:1295246–1298643), CGI_1 (chr5:1294872–1295134), CGI_2 (chr5:1291374–1294439), CGI_3 (chr5: 1289695–1291090), and the remaining gene body (chr5: 1249661–1289359). We computed ASM imbalance values for these regions by taking the mean CpG-pair ASM values.
To identify genome-wide methylation events indicative of TERT promoter mutations, we searched for correlates with TERT promoter mutation status among average methylation levels of CpG islands (CGIs). CpG island annotations were downloaded from the UCSC genome browser at http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cpgIslandExt.txt.gz. To filter out low-coverage CpG islands, we considered only CpG islands with at least eight CpG sites. Methylation levels per island were then estimated by taking the mean across all CpGs profiled within the island. Using the same filtering parameters, we also computed mean ASM estimates across these CGIs.
TERT locus methylation estimates are described in Supplementary file 5.
Chromatin profiling data
Request a detailed protocolTo identify histone modifications associated with TERTp status, we downloaded global chromatin profiling data from the CCLE Data Portal (CCLE_GlobalChromatinProfiling_20181130.csv). Correlations between TERTp status and histone modification levels, as well as correlations between H3K9ac1K14ac0 levels and CGI ASM levels, are described in Supplementary file 5.
Region enrichment analysis
Request a detailed protocolTo characterize the regions that were hypomethylated in association with TERT promoter mutations, we utilized Locus Overlap Analysis (LOLA) (Nagraj et al., 2018; Sheffield and Bock, 2016), which discovers enriched regions among a background set. LOLA takes as input two region sets: the regions of interest and a background universe set. Both sets are then overlapped against a database of annotated regions, and overlapping and non-overlapping region frequencies are computed per annotated region set. Region overlap significance is then assessed against each annotated region set using Fisher’s exact test with the two categories being the regions of interest vs. the background universe set and the overlap of each of these regions with the annotated region set.
We ranked hypomethylated CpG island regions by the significance of the change in TERTp wild type vs. TERTp mutant cell lines as assessed by a two-sided Mann-Whitney U test (i.e. regions with the most significant changes were ranked first). The top 1000 CpG islands were then used as the regions of interest, and the set of all CpG islands examined served as the background universe set. We utilized the LOLAweb application (at http://lolaweb.databio.org/), with the LOLACore and LOLARoadMap sets as the region databases.
Outputs of LOLA on CCLE TERT promoter-mutant hypomethylated CGIs are summarized in Supplementary file 6.
TCGA data
Request a detailed protocolTCGA methylation and normalized gene expression estimates were downloaded from the UCSC Xena browser (Goldman et al., 2019) (http://xena.ucsc.edu/, jhu-usc.edu_PANCAN_HumanMethylation450.betaValue_whitelisted.tsv.synapse_download_5096262.xena and EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena). Methylation levels of CGIs was estimated by averaging CpG methylation levels within each CGI, and CGIs with less than four profiled CpGs were excluded. TERT promoter mutation status was obtained from previous estimates (Barthel et al., 2017). The same hypomethylation-enrichment analysis previously described for the CCLE was then run on the top 1000 CGIs hypomethylated in TERT promoter-mutants using an identical ranking scheme.
A summary of LOLA results on TCGA methylation, TERT promoter status, and ALT-likelihood is provided in Supplementary file 7.
Statistical analysis
Request a detailed protocolMultiple hypothesis testing was accounted for using the Benjamini-Hochberg FDR with an alpha of 0.01 as provided by the statsmodels Python module.
Data availability
Telomere content estimates can be found in the supplementary materials and have been uploaded to the Cancer Dependency Map portal (https://depmap.org/portal/).
-
Gygi LabID gygi.hms.harvard.edu/publications/ccle.html. Quantitative Proteomics of the Cancer Cell Line Encyclopedia.
-
Dep Map PortalID depmap.org/portal/download/. DepMap 20Q2 Public.
-
European Genome-phenome ArchiveID EGAS00001000978. GDSC whole-exome sequencing data.
References
-
Alternative lengthening of telomeres: models, mechanisms and implicationsNature Reviews Genetics 11:319–330.https://doi.org/10.1038/nrg2763
-
Sensitive detection of somatic point mutations in impure and heterogeneous Cancer samplesNature Biotechnology 31:213–219.https://doi.org/10.1038/nbt.2514
-
Shelterin: the protein complex that shapes and safeguards human telomeresGenes & Development 19:2100–2110.https://doi.org/10.1101/gad.1346005
-
The RNA component of human telomeraseScience 269:1236–1241.https://doi.org/10.1126/science.7544491
-
The centennial of the Cajal bodyNature reviews. Molecular cell biology 4:975–980.https://doi.org/10.1038/nrm1262
-
Prevalence of the alternative lengthening of telomeres telomere maintenance mechanism in human Cancer subtypesThe American Journal of Pathology 179:1608–1615.https://doi.org/10.1016/j.ajpath.2011.06.018
-
Telomere sequence content can be used to determine ALT activity in tumoursNucleic Acids Research 46:4903–4918.https://doi.org/10.1093/nar/gky297
-
DNA hypermethylation within TERT promoter upregulates TERT expression in CancerJournal of Clinical Investigation 129:1801.https://doi.org/10.1172/JCI128527
-
TERT promoter mutations and telomeres during tumorigenesisCurrent Opinion in Genetics & Development 60:56–62.https://doi.org/10.1016/j.gde.2020.02.001
-
A box H/ACA small nucleolar RNA-like domain at the human telomerase RNA 3' endMolecular and Cellular Biology 19:567–576.https://doi.org/10.1128/MCB.19.1.567
-
LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysisNucleic Acids Research 46:W194–W199.https://doi.org/10.1093/nar/gky464
-
Multivalent engagement of chromatin modifications by linked binding modulesNature Reviews Molecular Cell Biology 8:983–994.https://doi.org/10.1038/nrm2298
-
Expression of human telomerase subunits and correlation with telomerase activity in cervical CancerCancer Research 58:1558–1561.
-
COSMIC: the catalogue of somatic mutations in CancerNucleic Acids Research 47:D941–D947.https://doi.org/10.1093/nar/gky1015
-
Estimating telomere length from whole genome sequence dataNucleic Acids Research 42:e75.https://doi.org/10.1093/nar/gku181
-
From fastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipelineCurrent Protocols in Bioinformatics 43:43.https://doi.org/10.1002/0471250953.bi1110s43
-
OB fold-containing protein 1 (OBFC1), a human homolog of yeast Stn1, associates with TPP1 and is implicated in telomere length regulationJournal of Biological Chemistry 284:26725–26731.https://doi.org/10.1074/jbc.M109.021105
-
Direct activation of TERT transcription by c-MYCNature genetics 21:220–224.https://doi.org/10.1038/6010
-
Frequent amplification of the telomerase reverse transcriptase gene in human tumorsCancer Research 60:6230–6235.
-
Telomerase RNA accumulates in cajal bodies in human Cancer cellsMolecular Biology of the Cell 15:81–90.https://doi.org/10.1091/mbc.e03-07-0525
Decision letter
-
C Daniela Robles-EspinozaReviewing Editor; International Laboratory for Human Genome Research, Mexico
-
Anna AkhmanovaSenior Editor; Utrecht University, Netherlands
-
C Daniela Robles-EspinozaReviewer; International Laboratory for Human Genome Research, Mexico
-
Floris BarthelReviewer; The Jackson Laboratory, United States
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This manuscript is of interest for researchers in the cancer genomics and telomere maintenance fields. Telomere maintenance is critical in tumour development as it allows cells to divide indefinitely. By studying a large collection of cell lines, the authors have identified genomic features broadly correlated with telomere content, including telomerase expression and mutation, as well as differences between tissues of origin, dependencies among different telomere maintenance genes, and correlations with region-specific methylation patterns. Overall, its value lies in the novel biological insights it provides regarding telomere maintenance and the massive resource it provides to the scientific community.
Decision letter after peer review:
Thank you for submitting your article "Integrated evaluation of telomerase activation and telomere maintenance across cancer cell lines" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including C Daniela Robles-Espinoza as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Maureen Murphy as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Floris Barthel (Reviewer #2).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
In this manuscript, Hu, Ghandi and Huang describe a number of genomic features that are correlated with telomere content, and focus on the study of cancer cell lines from two different sources (the Cancer Cell Line Encyclopedia, CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC). They describe genomic alterations, gene dependencies, telomerase (TERT) expression and promoter mutation, and methylation patterns both in the TERT locus and beyond. The study is impressive in scope, as putting together data from distinct cell lines from different batches and sources is not an easy task.
However, the reviewers have a number of suggestions to improve the conclusions and readability of the manuscript.
Essential revisions
1. About the structure of the manuscript.
Two reviewers commented that the manuscript is at times hard to understand due to it covering a large number of observations that not always follow logically from each other. Specifically, it was mentioned that the ATRX/DAXX section does not add much to the paper, and that a substantial part of analyses and supplementary data are not mentioned (e.g., What is CERES and DEMETER2 dependency, in Figure S6c and d?). Please re-check these issues, and if possible, simplify the paper to convey a clear message.
2. About cell lines.
Please answer the following questions.
a. Does the CCLE provide any information on cell line source? Do the authors have any cell lines with biological replicates from different sources that could determine how this affects telomere length, as in eg. PMID 30089904? Also, are there any technical replicates derived from unique libraries prepared from the same cell line to establish that these show similar measurements?
b. Do the authors think their conclusions are affected by the fact that cell lines are immortalised, and this process in many cases includes the activation of TERT? Can their conclusions be extended to living tissues?
c. Can the number of cell lines from both sets that were taken forward for downstream analyses be specified in the main text?
3. About telomere measurements and correlations.
a. The raw telomere content measurements from WGS are not discussed much and quickly discarded in favor of normalized estimates. Looking at the supplementary table these look to be all on a similar and comparable scale for WGS. It strikes me that WGS-based estimates are much more reliable, and some associations may be stronger using these numbers despite being limited to a much smaller sample size. Can the authors show the overlap between WGS/WXS and associated data (CRISPR screens, RNAseq, etc)? Do the described findings hold when limited to WGS raw estimates or is there insufficient power? Also, an explanation of why correlation is higher between WES and WGS of different datasets than with those of the same dataset would be welcome.
b. While the authors look at the patients age at tissue extraction and find a correlation between estimated telomere content and age, one wonders if cell culture specific parameters could not be even stronger determinants. Have the authors looked at time in culture, passage number and population doubling level (PDL) in relation to telomere length?
c. Could the authors infer ALT status for a large number of cell lines from other resources? Capital Biosciences was awarded NIH SBIR funding several years ago specifically to determine the ALT status of all cell lines in the ATCC.
See for example:
1. https://www.eposters.net/pdfs/identification-of-new-alternative-lengthening-of-telomeres-positive-cancer-cell-lines-using-the-c.pdf
2. https://www.atcc.org/~/media/PDFs/Presentations/HOC%20poster%20ALT.ashx
At a minimum, perhaps the authors could annotate lines known to exhibit ALT as ALT+ and others as ALT unknown?
d. It is slightly surprising there is a positive correlation between telomere content and TERT mRNA in CNS since CNS tumors are often ALT+ (and telomerase-). Possibly none of the CNS lines assayed were ALT+? Perhaps the correlation between TERT mRNA and telomere content needs to be adjusted for ALT status because of the presumed interaction. Could the authors incorporate multiple correlates of telomere content in a model and determine their independent contributions and interactions? Perhaps this could be limited to WGS and using the raw measurements.
e. Some correlations with TC (for example all Figure S3) are likely to be biased by the important TC differences between tissue types. Can the authors perform this analysis on each tissue separately, or normalize TC to use relative TC within a tissue type?
4. About figures.
a. Figure 1.
– This is an informative figure. Can a clarification of the total number of cell lines be added?
– It says it is 683 but from the main text it seems that many more were taken forward for analysis (1,056 GDSC with WES and 329 WGS CCLE, with only 286 overlapping)? Why are the others not depicted here?
– If the outlier U2-OS cell line is removed from the 44 bone cell lines, is the bone cell line average still the highest?
– Can the ATRX/DAXX expression be included?
b. Figure 2.
– Figure 2a-b: the authors claim that dependency toward TRF1 and the CST complex is higher in cells with short telomeres. However, they show a positive correlation between TC and AVANA dependency. This may be interpreted that cells with longer telomeres are more sensitive to loss of TRF1 or the CST complex. That could be explained by their role during telomere replication, and the likely higher replication stress at longer telomeres. If this is not the correct explanation, can the explanation be reworded, as it is confusing?
– Figure 2d-e: Does each dot represent a cell line? If so, are the conclusions (model in 2f) drawn out of only 1 cell with a mutation in a hotspot?
c. Figure 3a.
– Here, it seems that cells with lower PML expression are more sensitive to DAXX suppression. Could the analysis be done with telomerase positive cells only? Similarly, cells with higher levels of ZMYM3 are more sensitive to DAXX suppression. What does that mean biologically? Can you please specify what are the dots?, do they refer to cell lines? So these all represent the top score of their corresponding cell line?
5. Other points
a. The introduction and the narrative tell us that telomerase is activated, that telomeres are usually longer in cancer, etc. Therefore, the phrase "The 55 non-cancerous samples profiled as part of the GDSC also displayed relatively high telomere contents (P = 5.9×10-4, two sided Mann-Whitney U test), consistent with previous reports of widespread telomere shortening in cancer (Barthel et al., 2017)." seems to contradict what has been said. So, is it expected that the normal samples would have long telomeres? Would this be expected physiologically or is it an effect from these being cell lines? Also – is the information for these normal cell lines displayed in Figure 1? Reviewers recommend to add the normals to this figure so it is easier to compare between normal and cancer tissues.
b. Could the definition of what a dependency means, exactly, be included? Does it assess correlated gene expression? Does it assess activation of one gene upon knockout of the other? It would be helpful to spell this out in the main text. This may help understand the observed dependencies between the members of the shelterin complex and those of the CST complex.
c. Page 6 says "increased sensitivities to knockout of the CST complex components, which are key mediators of telomere capping and elongation termination (Chen et al., 2012), were correlated with lower telomere content." However, it seems that supplementary figure 3b indicates that the CST genes are positively correlated with telomere content? If this is an index of sensitivity, could this be indicated somewhere? (At the moment it seems similar to the TERT figure, so readers may read it in the same way which may be confusing). Could TERF1 be highlighted in Figure S3b please, DRIVE dependencies?
d. About methylation studies. In page 11 it says "TERTp mutants exhibited strong and significant (P < 0.01) increases in ASM (allele-specific CpG methylation) in the promoter (n = 485), remaining gene body (n = 493), and exon 1 (n = 478) regions", but then in page 12, line 292 it says "The observation that TERT promoter mutants display a hypomethylated TERT locus…" – does this not directly contradict the statement above? Or is it referring to methylation on a specific region? Could this be clarified please?
Reviewer #1:
In this manuscript, Hu, Ghandi and Huang describe a number of genomic features that are correlated with telomere content, and focus on the study of cancer cell lines from two different sources (the Cancer Cell Line Encyclopedia, CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC). They describe genomic alterations, gene dependencies, telomerase (TERT) expression and promoter mutation, and methylation patterns both in the TERT locus and beyond. The analyses are carefully done and largely convincing, and impressive in scope as putting together data from distinct cell lines from different batches and sources is not an easy task.
Their major claims are that:
– TERT expression correlates with telomere content in lung, central nervous system, and leukemia cell lines. This claim is supported by their analyses of telomere tract quantification and TERT mRNA levels in the same cell lines (Figure S4). However, some negative correlations in other cell lines were also found, which are currently unexplained.
– Lower telomeric content is associated with dependency of CST telomere maintenance components. This is supported by an analysis of the Avana dependency dataset, which has been previously published. This makes sense biologically and is clear in Figure S3.
– Increased dependencies of shelterin member genes are associated with wild-type TP53 status. An extended analysis as the one before in the Avana and DRIVE dependency datasets support this claim. This would mean that cells are less sensitive to shelterin depletion when they are mutated in TP53, which makes sense biologically.
– Monoallelic expression in TERT promoter-mutant contexts. The analysis behind this claim, based on alignment of RNA reads to heterozygous regions and the subsequent counting of reads mapping to each allele, seems robust, although it would have been nice to see a few more details (for example, what is the measure of linkage disequilibrium between the TERT promoter mutations and these anchor SNPs).
– TERT promoter-mutant cell lines show hypomethylation at PRC2-repressed regions. This claim is based on an analysis of genome-wide reduced representation bisulfite sequencing (RRBS) data in nearly 1000 cell lines, and the finding of associations between TERT promoter mutation and CpG methylation of different regions of the TERT gene and other loci. An overlap analysis showed that these methylated regions were enriched for PRC2-repressed regions.
All in all, I believe the claims in this study are well supported. This work constitutes a valuable resource for the scientific community, and has made new observations that contribute to the elucidation of telomere maintenance mechanisms.Reviewer #2:
Kevin Hu, Mahmoud Ghandi and Franklin W. Huang present their important work on determining telomere content across > 1000 cancer cell lines. In addition to this highly valuable dataset, the authors present several scientific vignettes wherein various questions are addressed using their curated resource. Using CRISPR/Cas9 screening data from the same set of cell lines, they find that sensitivity to CST (CTC1, STN1, TEN1) knockout was associated with telomere content and that cell lines sensitive to CST knockout demonstrated lower telomere content. Shelterin complex members (ACD, POT1, TERF1, TERF2, TINF2) were further identified as co-dependencies to CST knockout.
They find that cell lines with monoallelic expression, as in a TERT promoter mutant setting, also demonstrate allele-specific methylation. Interestingly, TERT promoter mutant demonstrated gene body hypomethylation not observed in other monoallelic TERT expressors. Moreover, this hypomethylation was found to be genome-wide and not restricted to the TERT locus. Finally, nearly all of this hypomethylation was localized in PRC2-repressed regions.
The hypothesis-driven vignettes are clearly of interest in providing correlative insights that could fuel future mechanistic studies. More importantly, they exemplify the strengths of the resource. Nevertheless, the primary value of the manuscript in my opinion is the enormous resource of telomere content estimates for cell lines widely used in biomedical research. The authors can do a more thorough job showing the reader that their measurements are reliable and whether or not they would be generally applicable to fresh cell lines purchased from a vendor.
1. WGS-based telomere content estimates are likely much more reliable than WXS-based estimates. The authors do not comment on this.
2. Cell lines from different sources can vary substantially. For example, HeLa strains from different laboratories can show vastly different karyotypes (PMID 30778230). These caveats are not currently discussed.
3. Telomere length attrition is replication dependent, however the authors have not looked at whether their measurements are associated with population doublings in individual samples or across multiple samples taken at different PDLs.
4. ALT status is an important confounder of telomere content. The authors hint at ALT status in their manuscript at various points, but do not incorporate ALT status into their analyses.
5. The combination of a number of parameters is going to determine telomere content in cell lines. It would be helpful to understand the combined contribution of multiple parameters.Reviewer #3:
In this manuscript, Huang and colleagues used public whole genome or whole exome sequencing datasets to establish telomere content in over a thousand cancer cells lines. Once telomere content was established, they linked it to other available sets of analysis, including RNAseq libraries and genome-wide CRISPR or RNAi screens.
First, they found that telomere content is highly variable among tissue types, with highest telomere content (TC) in tissues that have higher frequencies of ALT activation. They also found weak correlation with TERT or TERC expression. They then analyzed the correlation between telomere content and gene dependencies and found that dependencies to TRF1 and the CST correlates with TC. Independently of TC, they analyzed correlations between gene dependency of telomeric proteins or ATRX and DAXX with mutations or transcriptomic and proteomic profiles. Finally, they analyzed the link between TERT allelic expression and epigenetic marks on TERT promoter.
Although such broad analysis of telomere length or telomere proteins / TERT dependencies is interesting, it appears hard to really see the biological significance of certain correlations or the novelty of the findings.
https://doi.org/10.7554/eLife.66198.sa1Author response
Essential revisions
1. About the structure of the manuscript.
Two reviewers commented that the manuscript is at times hard to understand due to it covering a large number of observations that not always follow logically from each other. Specifically, it was mentioned that the ATRX/DAXX section does not add much to the paper, and that a substantial part of analyses and supplementary data are not mentioned (e.g., What is CERES and DEMETER2 dependency, in Figure S6c and d?). Please re-check these issues, and if possible, simplify the paper to convey a clear message.
Thank you for this comment and we agree. To improve the flow and clarity of the paper we have now removed the ATRX/DAXX section from the paper. We have also revised the axis labels of Figure S6c and d such that CERES and DEMETER2 dependency are replaced by Avana and DRIVE (the correct datasets used, whereas CERES and DEMETER2 correspond to the scoring algorithms). We appreciate the opportunity to simplify the paper.
2. About cell lines.
Please answer the following questions.
a. Does the CCLE provide any information on cell line source?
Yes, the CCLE does provide additional cell line source attributes. We have added these sources, as well as additional information on growth medium, freezing medium, doubling time, pathologist’s annotation, and TCGA code to Supplementary Table S1.
Do the authors have any cell lines with biological replicates from different sources that could determine how this affects telomere length, as in eg. PMID 30089904?
The CCLE WES, CCLE WGS, and GDSC WES datasets are of different biological replicates, and we show that there is a moderate correlation between CCLE WES and CCLE WGS telomere content as well a strong correlation between CCLE WGS and GDSC WES telomere content.
Also, are there any technical replicates derived from unique libraries prepared from the same cell line to establish that these show similar measurements?
We thank the reviewers for raising this question. Within the CCLE WGS dataset, we have since determined that the HEL 92.1.7 cell line was twice-sequenced. We have now run Telseq on this second sequencing run and found both estimates to be about 4 kilobases, showing a similar measurement. We have updated the manuscript accordingly.
b. Do the authors think their conclusions are affected by the fact that cell lines are immortalised, and this process in many cases includes the activation of TERT?
As cell lines are immortalised frequently through the activation of TERT, we think this is a useful setting in which to study telomerase expression and telomere length. So while we cannot exclude the fact that this common process is affected differently in cancer tissues, we think the immortalization process in cell lines offers a reasonable comparison to cancers and think our conclusions can also apply to cancer tissues.
Can their conclusions be extended to living tissues?
This is a very interesting question. We think that while there certainly are limitations of cancer cell lines in terms of their similarity to tumor tissues, the extent to which they can be studied and perturbed affords significant advantages, and common next-gen sequencing methods can be used alongside to assess telomere content, since many advanced cancers are now undergoing next-gen sequencing as part of routine clinical care. We do think our conclusions about the sensitivities of cancer cell lines based on telomere content to depletion of telomere maintenance mechanisms opens potential new strategies for targeting cancers by taking into account telomere content/length that will need further validation.
c. Can the number of cell lines from both sets that were taken forward for downstream analyses be specified in the main text?
Yes, we have now specified this number in the main text.
3. About telomere measurements and correlations.
a. The raw telomere content measurements from WGS are not discussed much and quickly discarded in favor of normalized estimates. Looking at the supplementary table these look to be all on a similar and comparable scale for WGS.
Although raw telomere content measurements from WGS are reliable based on the results of the original Telseq paper, we ultimately used a z-scored transformation to make these estimates more compatible with the WES-based content estimates. In addition, we applied a log-transformation on all telomere content estimates because of a strong skew towards cell lines with markedly high telomere contents.
To illustrate that our transformations do not markedly affect the results, we have added the raw CCLE WGS telomere content estimates to Figure S5a (correlation plots of CST complex dependencies versus telomere content). In these examples, we show that raw estimates are still correlated with the dependencies, although the linear relationship is weaker due to the skew in the raw telomere content.
It strikes me that WGS-based estimates are much more reliable, and some associations may be stronger using these numbers despite being limited to a much smaller sample size.
We agree with this point, and we would like to note that we indeed ran analyses such as the associations between telomere content and dependency (Figure 2), between telomere content and age as well as tissue type (Figure S2), and between telomere content and TERT and TERC transcriptomics. For Figure S3, we decided against using WGS estimates along because the overlap between WGS and datasets such as mass-spectrometry protein expression and DRIVE dependencies would have been relatively small, and because we believed that any strong associations would have been present anyway, as with the Avana associations with TERF1 and the CST complex members.
Can the authors show the overlap between WGS/WXS and associated data (CRISPR screens, RNAseq, etc)?
Yes, we have now added a summary figure to Figure S1 showing the totals within each dataset as well as the number of individual cell lines containing each annotation, in the format of a binary heatmap.
Do the described findings hold when limited to WGS raw estimates or is there insufficient power?
The described findings hold when limited to WGS raw estimates. The transformation used to produce the WGS transformed estimates was just a log-transformation followed by a linear z-score scaling, so our findings are not so sensitive to whether the WGS raw or transformed estimates were used. To illustrate, we have added plots of raw CCLE WGS telomere content versus CST complex dependencies to Figure S5.
Also, an explanation of why correlation is higher between WES and WGS of different datasets than with those of the same dataset would be welcome.
We have added histograms showing the distribution of the total numbers of reads containing at least 6 telomeric repeats among cell lines for each sequencing dataset to Figure S1. CCLE WGS lines had around 100,000 such reads, whereas GDSC WES lines had about 100-1,000 and most CCLE WES lines had less than 100. These read numbers suggest that CCLE WGS offers the most accurate estimates, followed by GDSC WES and then CCLE WES. Moreover, the difference in total telomeric reads detected would also explain why the correlation between CCLE WGS and GDSC WES (r=0.84) is much higher than that between CCLE WGS and CCLE WES (r=0.71), which itself exceeds that of CCLE WES versus GDSC WES (r=0.60).
b. While the authors look at the patients age at tissue extraction and find a correlation between estimated telomere content and age, one wonders if cell culture specific parameters could not be even stronger determinants. Have the authors looked at time in culture, passage number and population doubling level (PDL) in relation to telomere length?
We thank the Reviewer for this suggestion. We were able to find annotations for calculated doubling time in the CCLE, which appear to be weakly negatively correlated with telomere content. We have appended these plots to Figure S2.
c. Could the authors infer ALT status for a large number of cell lines from other resources? Capital Biosciences was awarded NIH SBIR funding several years ago specifically to determine the ALT status of all cell lines in the ATCC.
See for example:
1. https://www.eposters.net/pdfs/identification-of-new-alternative-lengthening-of-telomeres-positive-cancer-cell-lines-using-the-c.pdf
2. https://www.atcc.org/~/media/PDFs/Presentations/HOC%20poster%20ALT.ashx
At a minimum, perhaps the authors could annotate lines known to exhibit ALT as ALT+ and others as ALT unknown?
We have compiled a table of ALT status across the linked studies as well as additional ones from the literature and appended these to Table S1, with 76 annotated cell lines in total.
d. It is slightly surprising there is a positive correlation between telomere content and TERT mRNA in CNS since CNS tumors are often ALT+ (and telomerase-). Possibly none of the CNS lines assayed were ALT+? Perhaps the correlation between TERT mRNA and telomere content needs to be adjusted for ALT status because of the presumed interaction. Could the authors incorporate multiple correlates of telomere content in a model and determine their independent contributions and interactions? Perhaps this could be limited to WGS and using the raw measurements.
We have constructed models incorporating TERT mRNA expression, TERC mRNA expression, cell line primary site, TERTp status, donor age, and calculated doubling time for predicting CCLE WGS and GDSC WES telomere content separately. The outputs of these models are detailed in Table S2.
Of the CNS lines for which we found ALT status in the literature, we know that PFSK1 and DAOY are ALT+, and we also know that A172, KNS42, U251MG, MOGGUVW, and SW1088 are not. The majority of CNS tumors in this dataset from the CCLE are TERT+.
e. Some correlations with TC (for example all Figure S3) are likely to be biased by the important TC differences between tissue types. Can the authors perform this analysis on each tissue separately, or normalize TC to use relative TC within a tissue type?
This is an interesting suggestion. We have updated the analysis to include tissue type as a covariate (Table S2 and Figure S4b).
4. About figures.
a. Figure 1.
– This is an informative figure. Can a clarification of the total number of cell lines be added?
– It says it is 683 but from the main text it seems that many more were taken forward for analysis (1,056 GDSC with WES and 329 WGS CCLE, with only 286 overlapping)? Why are the others not depicted here?
We thank the reviewer for this positive feedback. In this figure, we omitted cell lines that lack expression and copy number annotations for TERT and TERC, merged (CCLE WGS and GDSC WES) telomere content estimates, and ATRX and DAXX mutation profiling for clarity. We also omitted tissue types with less than 10 cell lines present after this preliminary filtering was applied. This filtering was done for clarity, as it would have been difficult to display cell lines with missing values in a concise format. (We have since made an exception for non-cancerous cell lines as per the recent comments). In total, 738 cell lines are shown in this figure.
– If the outlier U2-OS cell line is removed from the 44 bone cell lines, is the bone cell line average still the highest?
If the U2-OS cell line is removed, the bone cell line average is no longer the highest (blood becomes the highest). To account for the impact of cell lines with particularly high telomere contents, we have revised this figure to sort by median, rather than mean, telomere content.
– Can the ATRX/DAXX expression be included?
We have now added ATRX and DAXX expression levels to the bottom of the Figure 1.
b. Figure 2.
– Figure 2a-b: the authors claim that dependency toward TRF1 and the CST complex is higher in cells with short telomeres. However, they show a positive correlation between TC and AVANA dependency. This may be interpreted that cells with longer telomeres are more sensitive to loss of TRF1 or the CST complex. That could be explained by their role during telomere replication, and the likely higher replication stress at longer telomeres. If this is not the correct explanation, can the explanation be reworded, as it is confusing?
We thank the Reviewer for this helpful suggestion. We have reworded our discussion of the association between telomere content and CST complex dependencies to make this point clearer. In particular, the numerical dependency scores that we use are calculated such that a lower score indicates a stronger (or higher) dependency. Therefore, we meant to say that cells with longer telomeres are less sensitive to loss of TRF1 and the CST complex, as evidenced by a higher raw dependency score.
– Figure 2d-e: Does each dot represent a cell line? If so, are the conclusions (model in 2f) drawn out of only 1 cell with a mutation in a hotspot?
Here, each dot represents a mutation, and we aim to show the mutations that are associated with the respective dependencies of telomere-related proteins. We have remade this figure in the format of a volcano plot to make this point more clear.
c. Figure 3a.
– Here, it seems that cells with lower PML expression are more sensitive to DAXX suppression. Could the analysis be done with telomerase positive cells only? Similarly, cells with higher levels of ZMYM3 are more sensitive to DAXX suppression. What does that mean biologically? Can you please specify what are the dots?, do they refer to cell lines? So these all represent the top score of their corresponding cell line?
In response to previous concerns regarding the low contribution of this figure to the paper, we have removed this figure from the text.
5. Other points
a. The introduction and the narrative tell us that telomerase is activated, that telomeres are usually longer in cancer, etc. Therefore, the phrase "The 55 non-cancerous samples profiled as part of the GDSC also displayed relatively high telomere contents (P = 5.9×10-4, two sided Mann-Whitney U test), consistent with previous reports of widespread telomere shortening in cancer (Barthel et al., 2017)." seems to contradict what has been said. So, is it expected that the normal samples would have long telomeres? Would this be expected physiologically or is it an effect from these being cell lines? Also – is the information for these normal cell lines displayed in Figure 1? Reviewers recommend to add the normals to this figure so it is easier to compare between normal and cancer tissues.
We appreciate this comment and agree it enables a better comparison and have now added the normal cell lines to this figure (they were previously excluded due to a lack of expression and copy number profiling in the CCLE). We have also revised the introduction to state that cancers in general tend to have shorter telomeres than normal tissues, perhaps as a result of telomere maintenance mechanisms emerging only after telomeres have become sufficiently short in a crisis or because there is some fitness advantage to shorter telomeres.
b. Could the definition of what a dependency means, exactly, be included? Does it assess correlated gene expression? Does it assess activation of one gene upon knockout of the other? It would be helpful to spell this out in the main text. This may help understand the observed dependencies between the members of the shelterin complex and those of the CST complex.
We thank the Reviewer for this question. We have now added an explanation of the numerical value of a gene dependency to the “Telomere content associates with CST complex dependencies” section. In particular, a dependency score reflects how sensitive a particular cell line is to the inactivation of a particular gene, with more negative scores reflecting increased sensitivity.
c. Page 6 says "increased sensitivities to knockout of the CST complex components, which are key mediators of telomere capping and elongation termination (Chen et al., 2012), were correlated with lower telomere content." However, it seems that supplementary figure 3b indicates that the CST genes are positively correlated with telomere content? If this is an index of sensitivity, could this be indicated somewhere? (At the moment it seems similar to the TERT figure, so readers may read it in the same way which may be confusing). Could TERF1 be highlighted in Figure S3b please, DRIVE dependencies?
Consistent with the above comment, we have revised our wording to clarify our results here. A negative dependency score reflects increased sensitivity, so a positive correlation between telomere content and dependency scores suggests that cell lines with higher telomere content are less sensitive to inactivation of the CST complex members.
We have now highlighted TERF1 in Figure S3b.
d. About methylation studies. In page 11 it says "TERTp mutants exhibited strong and significant (P < 0.01) increases in ASM (allele-specific CpG methylation) in the promoter (n = 485), remaining gene body (n = 493), and exon 1 (n = 478) regions", but then in page 12, line 292 it says "The observation that TERT promoter mutants display a hypomethylated TERT locus…" – does this not directly contradict the statement above? Or is it referring to methylation on a specific region? Could this be clarified please?
We appreciate the opportunity to make this clarification. We state that TERTp mutants exhibited increases in ASM (allele-specific CpG methylation) yet show a hypomethylated TERT locus. This statement is consistent, because the increase in ASM is due to the hypomethylation (loss of methylation) at the TERT locus on a single allele. We have reworded this section to make this point clearer to the reader.
https://doi.org/10.7554/eLife.66198.sa2Article and author information
Author details
Funding
Prostate Cancer Foundation (Young Investigator Award)
- Franklin W Huang
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Dr. Elizabeth Blackburn (UCSF) and Dr. Thomas Cech (University of Colorado Boulder) for providing insightful comments and feedback on the manuscript. We thank Dr. Hani Goodarzi (UCSF) for his generosity in providing storage and computing resources. We thank members of the Huang lab for their feedback.
Senior Editor
- Anna Akhmanova, Utrecht University, Netherlands
Reviewing Editor
- C Daniela Robles-Espinoza, International Laboratory for Human Genome Research, Mexico
Reviewers
- C Daniela Robles-Espinoza, International Laboratory for Human Genome Research, Mexico
- Floris Barthel, The Jackson Laboratory, United States
Publication history
- Received: January 2, 2021
- Preprint posted: January 24, 2021 (view preprint)
- Accepted: August 27, 2021
- Accepted Manuscript published: September 6, 2021 (version 1)
- Version of Record published: October 21, 2021 (version 2)
Copyright
© 2021, Hu et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 2,240
- Page views
-
- 306
- Downloads
-
- 3
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Cancer Biology
- Computational and Systems Biology
Colorectal cancer (CRC) remains a challenging and deadly disease with high tumor microenvironment (TME) heterogeneity. Using an integrative multi-omics analysis and artificial intelligence-enabled spatial analysis of whole-slide images, we performed a comprehensive characterization of TME in colorectal cancer (CCCRC). CRC samples were classified into four CCCRC subtypes with distinct TME features, namely, C1 as the proliferative subtype with low immunogenicity; C2 as the immunosuppressed subtype with the terminally exhausted immune characteristics; C3 as the immune-excluded subtype with the distinct upregulation of stromal components and a lack of T cell infiltration in the tumor core; and C4 as the immunomodulatory subtype with the remarkable upregulation of anti-tumor immune components. The four CCCRC subtypes had distinct histopathologic and molecular characteristics, therapeutic efficacy, and prognosis. We found that the C1 subtype may be suitable for chemotherapy and cetuximab, the C2 subtype may benefit from a combination of chemotherapy and bevacizumab, the C3 subtype has increased sensitivity to the WNT pathway inhibitor WIKI4, and the C4 subtype is a potential candidate for immune checkpoint blockade treatment. Importantly, we established a simple gene classifier for accurate identification of each CCCRC subtype. Collectively our integrative analysis ultimately established a holistic framework to thoroughly dissect the TME of CRC, and the CCCRC classification system with high biological interpretability may contribute to biomarker discovery and future clinical trial design.
-
- Cancer Biology
In embryonal rhabdomyosarcoma (ERMS) and generally in sarcomas, the role of wild-type and loss or gain-of-function TP53 mutations remains largely undefined. Eliminating mutant or restoring wild-type p53 is challenging; nevertheless, understanding p53 variant effects on tumorigenesis remains central to realizing better treatment outcomes. In ERMS, >70% of patients retain wild-type TP53, yet mutations when present are associated with worse prognosis. Employing a kRASG12D-driven ERMS tumor model and tp53 null (tp53-/-) zebrafish, we define wild-type and patient-specific TP53 mutant effects on tumorigenesis. We demonstrate that tp53 is a major suppressor of tumorigenesis, where tp53 loss expands tumor initiation from <35% to >97% of animals. Characterizing three patient-specific alleles reveals that TP53C176F partially retains wild-type p53 apoptotic activity that can be exploited, whereas TP53P153D and TP53Y220C encode two structurally related proteins with gain-of-function effects that predispose to head musculature ERMS. TP53P153D unexpectedly also predisposes to hedgehog expressing medulloblastomas in the kRASG12D-driven ERMS-model.