The human leukemia virus HTLV-1 alters the structure and transcription of host chromatin in cis

  1. Anat Melamed
  2. Hiroko Yaguchi  Is a corresponding author
  3. Michi Miura
  4. Aviva Witkover
  5. Tomas W Fitzgerald
  6. Ewan Birney
  7. Charles RM Bangham  Is a corresponding author
  1. Imperial College London, United Kingdom
  2. The European Bioinformatics Institute (EMBL-EBI), United Kingdom
5 figures, 1 table and 4 additional files

Figures

Figure 1 with 11 supplements
HTLV-1 forms distant contacts with the host genome.

(A) Upper line: the HTLV-1 genome (green), with a long terminal repeat (LTR) at each end, is integrated into a clone-specific site in the human genome (grey). The q4C viewpoint (blue rectangle) is the NlaIII fragment within the HTLV-1 genome (nucleotide residues 6564–7246) which contains the CTCF binding site (CTCF-BS; black arrowhead). Lower line: the CTCF-BS (blue hexagon) in the provirus can dimerize with a CTCF-BS in the flanking host genome. (B) Chromatin contacts identified by q4C in two different clones. For each clone, the top panel depicts the q4C profile in the 5′ and 3′ host genome flanking the provirus (two biological duplicates), quantified as the normalized frequency of ligation events in overlapping windows (window width 10 kb, step 1 kb). On the horizontal axis, positive values denote positions downstream of the provirus (i.e. lying 3′ of the 3′ LTR); negative values denote upstream position. VP – viewpoint in q4C (proviral integration site). Diamonds mark the positions of reproducible chromatin contact sites called by the peak caller (Materials and methods). CTCF panel – open arrowheads denote positions of CTCF-BS; the filled arrowhead denotes the CTCF-BS in the provirus. Genes panel shows RefSeq protein-coding genes in the flanking host genome. The q4C profiles of remaining clones are shown in Figure 1—figure supplement 110. (C) Number of detected peaks in each clone. (D) Distance from detected q4C peaks to the respective proviral integration site.

https://doi.org/10.7554/eLife.36245.002
Figure 1—figure supplement 1
q4C and RNASeq data aligned – clone 6.25.

The distance from the integration site was chosen such that all called peaks are shown. For each clone, the top panel depicts q4C profile in the infected chromosome in duplicate (normalized frequency of ligation events in overlapping 10 kb windows, step 1 kb). On the horizontal axis, positive values denote positions downstream of the provirus (i.e. lying 3′ of 3′ LTR), negative values denote upstream position. VP – viewpoint in q4C (proviral integration site). Diamonds mark the positions of reproducible chromatin contact sites called by the peak caller. CTCF panel – open arrowheads denote positions of CTCF-BS. The filled arrowhead denotes the CTCF-BS in the provirus. Genes panel shows RefSeq protein coding genes in the genomic environment. RNA density – the normalized transcription density in 1 kb bins in same (blue) or opposite (red) orientation compared to the HTLV-1 plus-strand. RNA ratio – the ratio of transcription density over the median of all clones in the same position in same (blue) or opposite (red) orientation. Clones also displayed in main body of the paper are highlighted at the top of the page.

https://doi.org/10.7554/eLife.36245.003
Figure 1—figure supplement 2
q4C and RNASeq data aligned – clone 10.1.
https://doi.org/10.7554/eLife.36245.004
Figure 1—figure supplement 3
q4C and RNASeq data aligned – clone 8.8.

See legend for Figure 1—figure supplement 1.

https://doi.org/10.7554/eLife.36245.005
Figure 1—figure supplement 4
q4C and RNASeq data aligned – clone 3.83.

See legend for Figure 1—figure supplement 1.

https://doi.org/10.7554/eLife.36245.006
Figure 1—figure supplement 5
q4C and RNASeq data aligned – clone 8.13. 

See legend for Figure 1—figure supplement 1.

https://doi.org/10.7554/eLife.36245.007
Figure 1—figure supplement 6
q4C and RNASeq data aligned – clone 11.50. 

See legend for Figure 1—figure supplement 1.

https://doi.org/10.7554/eLife.36245.008
Figure 1—figure supplement 7
q4C and RNASeq data aligned – clone 11.63. 

See legend for Figure 1—figure supplement 1.

https://doi.org/10.7554/eLife.36245.009
Figure 1—figure supplement 8
q4C and RNASeq data aligned – clone TBX4B. 

See legend for Figure 1—figure supplement 1.

https://doi.org/10.7554/eLife.36245.010
Figure 1—figure supplement 9
q4C and RNASeq data aligned – clone 11.65. 

See legend for Figure 1—figure supplement 1.

https://doi.org/10.7554/eLife.36245.011
Figure 1—figure supplement 10
q4C and RNASeq data aligned – clone 3.60. 

See legend for Figure 1—figure supplement 1.

https://doi.org/10.7554/eLife.36245.012
Figure 1—figure supplement 11
Position of q4C peaks relative to the HTLV-1 provirus. We defined 'upstream' peaks as q4C peaks that lie on the 5′ side of the 5′ LTR of the HTLV-1 provirus, and 'downstream' peaks as those which lie 3′ to the 3′ LTR.

(A) significantly fewer peaks were found upstream of the integration site than downstream (15 vs 29; p=0.03, chi-squared goodness of fit test). (B) The distribution of absolute distance between each q4C peak and the integration site was compared between upstream and downstream peaks (p=0.13, Wilcoxon rank sum test). (C) The frequency of the presence of a CTCF binding site (CTCF-BS) in a q4C peak did not differ between upstream and downstream peaks. (p=1, Fisher’s exact test).

https://doi.org/10.7554/eLife.36245.013
Figure 2 with 5 supplements
The HTLV-1 provirus makes chromatin contacts in cis with the infected chromosome.

(A) The HTLV-1 provirus is present in one copy per cell. The infected chromosome (green) can be distinguished from the uninfected homologous chromosome (dark blue) by heterozygous single-nucleotide polymorphisms (SNPs), marked by the nucleotides above each chromosome. (B) The frequency of allele usage in unique q4C reads containing heterozygous SNPs (at least two reads per position) was measured, to quantify the degree of allelic imbalance, that is the degree of monoallelic usage present in q4C reads at a heterozygous SNP. Allelic imbalance ranges between 0 (biallelic, i.e. half of reads come from each allele) and 0.5 (monoallelic, i.e. all reads from one allele only). The dark blue line (above) shows the range of allele usage in the q4C reads; the light grey line (below) shows the allele usage for the same SNPs in the whole-genome sequencing reads. Curves were computed using LOESS regression. (C) The infected chromosome was distinguished from the homologous uninfected chromosome using q4C data (top panel) and chromosome-specific PCR (bottom panel). Top panel - heterozygous SNPs in DNA were phased computationally to identify the two haplotypes (A and B) (Materials and methods); the alleles present in q4C data were then assigned to the respective haplotype (circles). On the horizontal axis, positive values denote positions downstream of the provirus, and negative values denote positions upstream. Within at least 100 kb, all identified heterozygous SNP alleles mapped to only one of the two haplotypes. Bottom panel – haplotype assignment was confirmed using haplotype-specific PCR. Each nucleotide shown is a heterozygous SNP within 5 kb of the proviral integration site, identified in the respective clone by whole-genome sequencing. The SNPs were then mapped to the respective haplotype by Sanger sequencing of long-range products amplified by PCR either between the provirus and host genome (inf – infected haplotype) or across the provirus (uninf – uninfected haplotype). Further examples are shown in Figure 2—figure supplement 15.

https://doi.org/10.7554/eLife.36245.015
Figure 2—figure supplement 1
Identification of infected chromosomes - clone 11.50

The infected chromosome was distinguished from the homologous uninfected chromosome using q4C data (top panel) and chromosome-specific PCR (bottom panel) (further example shown in Figure 2C). Top panel - heterozygous SNPs in DNA were phased computationally to identify the two haplotypes (A and B) (see Materials and methods), and the alleles present in q4C data were then assigned to the respective haplotype (circles). On the horizontal axis, positive values denote positions downstream of the provirus and negative values denote positions upstream. Within at least 100 kb, all identified heterozygous SNP alleles mapped to only one of the two haplotypes. Bottom panel – haplotype assignment was confirmed using haplotype-specific PCR. Each nucleotide shown is a heterozygous SNP within 5 kb of the viral integration site. These SNPs were mapped to the respective haplotype by Sanger sequencing of long-range products amplified by PCR either between the provirus and host genome (inf – infected) or across the provirus (uninf – uninfected).

https://doi.org/10.7554/eLife.36245.016
Figure 2—figure supplement 2
Identification of infected chromosomes - clone 11.65.

See legend for Figure 2—figure supplement 1.

https://doi.org/10.7554/eLife.36245.017
Figure 2—figure supplement 3
Identification of infected chromosomes - clone 6.25.

See legend for Figure 2—figure supplement 1.

https://doi.org/10.7554/eLife.36245.018
Figure 2—figure supplement 4
Identification of infected chromosomes - clone 8.13.

See legend for Figure 2—figure supplement 1.

https://doi.org/10.7554/eLife.36245.019
Figure 2—figure supplement 5
Identification of infected chromosomes - clone TBX4B.

See legend for Figure 2—figure supplement 1.

https://doi.org/10.7554/eLife.36245.020
Figure 3 with 1 supplement
Dependency of virus-host contacts on CTCF binding.

(A) Of 44 contacts identified by q4C in the clones examined, 22 contained one CTCF-BS (N = 17) or two CTCF-BS (N = 5); the remaining 22 contacts did not contain a CTCF-BS. The presence of one or more CTCF-BS in observed q4C peaks was higher than expected by chance (p=4.73 * 10−4, Fisher’s exact test; see also Figure 3—figure supplement 1). (B) The polarity was determined of the proviral CTCF-BS (filled arrowhead) and the host CTCF-BS (open arrowhead). Of the CTCF-containing peaks whose polarity could be determined, convergent orientation (possible only for downstream peaks) was found in 46% of peaks, divergent orientation (possible only for upstream peaks) in 8% of peaks, and tandem orientation (possible for either upstream or downstream peaks) in 46% of peaks. (C) Distribution of q4C peak height (mean number of ligation events between replicates identified in each peak) in peaks containing 0, 1 or 2 CTCF-BS (coloured as in panel A): peaks that contained at least one CTCF-BS were significantly higher than those that lacked a CTCF-BS (p=0.025, Wilcoxon test). In addition, there was a significant correlation between mean q4C peak height and the number of overlapping CTCF-BS (p=0.0156, Spearman’s rank correlation test) (not illustrated). (D) The number of observed contacts was positively correlated with the number of CTCF-BS within 0.5 Mb of the proviral integration site (p=0.011, Pearson’s correlation test). (E–F) q4C analysis was carried out on a clone from an ATL-derived cell line (E) and a T-cell clone (F), respectively either the wild-type cells (WT; top panels) or after CRISPR-Cas9 knockout of the proviral CTCF-BS (bottom panels). The vertical axis shows the normalized number of q4C ligation events (overlapping windows 5 kb wide, with 1 kb steps). The bottom track in each panel shows the position of known RefSeq protein-coding genes; in clone TBX4B the provirus is inserted between exons of the gene PNPLA (shown in blue; see Results section).

https://doi.org/10.7554/eLife.36245.021
Figure 3—figure supplement 1
CTCF-BS occupancy at 4C peaks is higher than random expectation.

For each observed peak, we selected random control sites within 2 Mb of the provirus and with a width equal to the observed peak and a position on the same chromosome, within a reasonable distance from the same viral integration site (observed range), and the number of overlapping CTCF binding site (BS) was counted. This process was repeated 100 times. The number of 4C peaks (left) or randomly matched ranges (median of 100; right) which did (blue) or did not (yellow) contain a CTCF BS. Error bars indicate interquartile range.

https://doi.org/10.7554/eLife.36245.022
Figure 4 with 2 supplements
Integration site-specific upregulation of host transcription.

(A) In each column, the green arrow indicates the HTLV-1 proviral integration site in the clone indicated at the top of the column. Each row shows the the transcription density (normalized RNA-seq read count) flanking that genomic position in the clone indicated at the right-hand side. In each case, transcription orientation and positions are shown relative to the integrated provirus. Read density shown in blue shows transcription in the same orientation as the proviral plus-strand; red shows transcription in the antisense orientation to the proviral plus-strand. (B) q4C profiles of two clones aligned with the transcription density within 300 kb of the proviral integration site. The RNA ratio shows the ratio of transcription density in a given bin (number of reads in 1 kb bin/total number of reads in sample) in the target clone, divided by the median expression density of all clones in that bin. Colours represent expression in the same sense (blue) or opposite sense (red) to the HTLV-1 plus-strand. Data on the remaining clones are shown in Figure 1—figure supplement 110. (C) Left panel: median ratio of transcription density of all clones, aligned on the integration site (1 kb bins, up to 0.5 Mb from the integration site). Right panel: median ratio of transcription density at 10 genomic positions, selected at random from a gap-excluded hg19 reference genome. (D) Analysis carried out as in panel C, separately for clones expressing HTLV-1 plus-strand transcripts at a level greater than (left panel) or less than (right panel) the median of all clones.

https://doi.org/10.7554/eLife.36245.023
Figure 4—figure supplement 1
Upregulation of transcription within 100 kb of integration site.

Normalized ratio of transcription density in each clone (Figure 1, Figure 1—figure supplements 110) between 100 kb upstream and 100 kb downstream of the respective proviral integration site; transcription is oriented relative to the proviral plus-strand. Data are normalized within each clone to the highest ratio value within these 200 kb.

https://doi.org/10.7554/eLife.36245.024
Figure 4—figure supplement 2
Examples of clone-specific aberrant transcription and splicing.

(A) RNA-seq reads (upper panel) and splice junctions (boxed, lower panel) flanking the provirus in clone 3.60 and at the same genomic location in clone 3.83. Transcription in the same orientation as the proviral plus-strand is shown in blue; transcription in the antisense orientation to the proviral plus-strand in red. (B) Splice junctions flanking the provirus in clone 8.8 and at the same genomic location in clones 6.25 and 11.63, coloured relative to the proviral plus-strand as in A. The green arrows indicate the HTLV-1 proviral integration sites respectively in clone 3.60 (A) and clone 8.8 (B).

https://doi.org/10.7554/eLife.36245.025
Clone-specific host transcription is derived from the infected chromosome.

(A) Allelic imbalance (AI) denotes the degree of monoallelic usage of identified SNPs: AI = 0 indicates biallelic transcription; AI = 0.5 indicates monoallelic transcription. In each clone, the AI was quantified in transcripts within 2 Mb of the proviral integration sites and compared with the value at that site in all other clones. Clone-specific transcription (transcription density in the clone carrying the provirus, 2-fold or greater than the median; 1 kb bins) was monoallelic; shared transcription was biallelic. While there was no significant difference between the allelic imbalance in those bins for which there was little or no change in transcription from median, for those bins where clone specific expression was observed (two fold or greater increase), the allelic imbalance was significantly greater (more monoallelic) in the integration site clone compared to remainder of clones (p=6.7 * 10−12, Wilcoxon test). (B) Transcription density depicted as in Figure 4A, analysed by haplotype (see Figure 2). Columns are coloured by the mean frequency of infected or uninfected alleles (1 kb bins). White columns did not include SNPs that could be assigned to a single haplotype. (C) Median ratio of transcription density (log scale) in 1 kb bins containing a heterozygous SNP coloured by the frequency of alleles derived from the infected (green) or uninfected (blue) haplotypes. (D) The SNP alleles expressed at ≥2 × median level were over-represented in the infected haplotype.

https://doi.org/10.7554/eLife.36245.026

Tables

Table 1
T cell clones used.
https://doi.org/10.7554/eLife.36245.014
SubjectClone(s)
 TBJ3.60, 3.83
 TCX8.13, 8.8
 TCT10.1
 TBW11.50, 11.63, 11.65, 13.50(U)
 TBXTBX4B
 HAY6.25, 6.30(U)

Additional files

Supplementary file 1

T cell clones used.

Extended data on clones shown in Table 1. All subjects are HTLV-1 carriers with HAM/TSP, except for HAY who is an asymptomatic carrier of HTLV-1. tax expression of ‘high’ or ‘low’ denotes whether the frequency of plus-strand viral transcripts was higher or lower than the median, respectively.

https://doi.org/10.7554/eLife.36245.027
Supplementary file 2

Schematic diagram to compare conventional 4C (A) and q4C (B) protocol.

In the conventional protocol (A), after digesting the crosslinked chromatin with the first restriction enzyme (1 st RE) and ligating the free ends, the DNA was digested with a second restriction enzyme (second RE) followed by religation and inverse PCR to amplify viewpoint (VP)-linked genomic regions. In q4C, we modified the 4C protocol (Krijger and de Laat, 2016) by applying the approach used in our previously described linker-mediated (LM)-PCR protocol (Gillet et al., 2011) for identifying and quantifying proviral integration sites. In q4C, instead of the secondary restriction enzyme, sonication is used to process DNA circles. Linkers with a 6 bp specific tag was added to sonicated DNA. The end of the VP and a fragment of genomic DNA were amplified by LM-PCR. In this example, three ligation events occurred between the VP (red) and a genomic region (green) at the ligation site I and one event at the ligation site II (yellow). Because the DNA shear site is (approximately) random, the amplicon from each cell has a different shear site. The abundance of ligation events at each respective ligation site is quantified by counting the number of different shear sites.

https://doi.org/10.7554/eLife.36245.028
Supplementary file 3

q4C data analysis steps.

Summary of main steps in the analysis steps of q4C data. See Materials and methods for details.

https://doi.org/10.7554/eLife.36245.029
Transparent reporting form
https://doi.org/10.7554/eLife.36245.030

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Anat Melamed
  2. Hiroko Yaguchi
  3. Michi Miura
  4. Aviva Witkover
  5. Tomas W Fitzgerald
  6. Ewan Birney
  7. Charles RM Bangham
(2018)
The human leukemia virus HTLV-1 alters the structure and transcription of host chromatin in cis
eLife 7:e36245.
https://doi.org/10.7554/eLife.36245