RNA virus attenuation by codon pair deoptimisation is an artefact of increases in CpG/UpA dinucleotide frequencies

  1. Fiona Tulloch
  2. Nicky J Atkinson
  3. David J Evans
  4. Martin D Ryan
  5. Peter Simmonds  Is a corresponding author
  1. University of St Andrews, United Kingdom
  2. University of Edinburgh, United Kingdom
  3. University of Warwick, United Kingdom
6 figures, 4 tables and 1 additional file

Figures

Figure 1 with 4 supplements
(A) CP usage in human coding sequences arranged in a 64 x 64 grid.

CP frequencies relative to those expected from nucleotide and amino acid frequencies (CP bias) are colour coded in a heat map. The primary division on the x-axis is by identity of the 3–1 dinucleotide as annotated. Within these, further divisions within each of the 16 columns show the identity of the nucleotide at position 2 (A, C, G or U). The y-axis records nucleotides at positions 4 (4 main divisions on the y-axis), 1 (4 subdivisions of position 4) and 6 (4 subdivisions of position 1). Positions of unused codon pairs containing a 5′ stop codon (translated as *|x) are shaded in grey. CP usage heat maps for A. thaliana, C. elegans and E. coli coding sequences are shown in Figure 1—figure supplement 2A–2C. (B) Distribution of codon pair bias scores in human coding sequences; separate labelling of the 64 codon pairs with CpG (red) or UpA (blue) across the codon junction (3–1) demonstrates their consistent under-representation based on their component nucleotide and amino acid frequencies. The distribution of codon pair scores for A. thaliana, C. elegans and E. coli are shown in Figure 1—figure supplement 3A–3C. Correlations between codon pair scores between human coding sequences and those of A. thaliana, C. elegans and E. coli are shown in Figure 1—figure supplement 4.

https://doi.org/10.7554/eLife.04531.004
Figure 1—figure supplement 1
Distribution of relative synonymous codon usage values for degenerate codons in the human genome (stop codons were excluded).

Codons with CpG and UpA at the 1–2 or 2–3 codon position are shaded as indicated in the key (data derived from http://bioinformatics.weizmann.ac.il/databases/codon).

https://doi.org/10.7554/eLife.04531.005
Figure 1—figure supplement 2
CP scores of codon pairs of (A) A. thaliana, (B) C. elegans and (C) E.coli ORFeomes.

The primary division on the x-axis is by identity of the 3–1 dinucleotide (labelled on y-axis), divisions within each column show the identity of codon position 2. The y-axis records codon positions 5 (1 cycle), 1 (4 cycles) and 6 (16 cycles). Positions of codon pairs translated as *|x are shaded grey.

https://doi.org/10.7554/eLife.04531.006
Figure 1—figure supplement 3
Distribution of codon pair scores for other organisms-(A) A thaliana, (B) C. elegans and (C) E. coli, with separate representation of codon pairs with CpG and UpA across the codon junction.
https://doi.org/10.7554/eLife.04531.007
Figure 1—figure supplement 4
Correlation between representations of human codon pairs (x-axis) with those of other organisms-(A) A thaliana, (B) C. elegans and (C) E. coli (y-axis).
https://doi.org/10.7554/eLife.04531.008
Codon pair scores and numbers of CpG and UpA dinucleotides in native (WT) and mutated regions of E7.

Mean CP scores for Regions 1 and 2 combined are shown on the x-axis; the total numbers of CpG and UpA dinucleotides in each sequence are shown on the y-axis. The histogram shows CP scores for the 35,170 human mRNA sequences >200 bases in length (mean 0.072; standard deviation ±0.031).

https://doi.org/10.7554/eLife.04531.009
Replication of WT and mutants of E7 with altered CP and dinucleotide frequencies.

Bars are shaded diagrammatically based on their relative CpG/UpA composition. RD cells were infected with E7 WT, at an MOI of 0.03 and infectious titres quantified at 8, 18 and 30 hr time points post inoculation (p.i.) by TCID50 determination. Results are the mean of three biological replicates; error bars show standard errors of the mean.

https://doi.org/10.7554/eLife.04531.011
Figure 4 with 1 supplement
RD cells were co-infected with pairs of WT (W|W) and E7 mutants at equal MOI and the supernatant serially passaged through cells after development of CPE. RNA was isolated and the composition of each virus determined through selective restriction digests using enzymes listed in Table 3.

(A) Examples of three competition assays showing cleavage patterns of individual viruses (lanes 1, 2), the starting inoculum (lane 3) and two biological replicates after 10 (panels 1, 2) or 5 (panel 3) passages in lanes 4 and 5. Results from the other competition assays are shown in Figure 4—figure supplement 1. (B) Summary of pairwise fitness comparisons of viruses with outcomes for the viruses listed in columns at passages 5 and 10 indicated by colour shading. For example, Min-E and WT showed equal fitness (yellow shading) and cu|cu outcompeted WT by passage 5 (red) and Max-U by passage 10.

https://doi.org/10.7554/eLife.04531.012
Figure 4—figure supplement 1
Competition assays between E7 mutants showing competing variants (lanes 1 and 2) andout at indicated passage number (lane 3) for each.
https://doi.org/10.7554/eLife.04531.013
Figure 5 with 1 supplement
Translation of RNA templates generated from WT and mutant E7 cDNAs in a rabbit reticulocyte cell free assay.

Assignments of bands to E7 proteins were based on molecular weights on SDS-PAGE. A comparison of densitometry values for viral proteins is shown in Figure 5—figure supplement 1.

https://doi.org/10.7554/eLife.04531.016
Figure 5—figure supplement 1
Translation efficiencies estimated by densitometry of band intensities of viral proteins translated in a rabbit reticulocyte cell free assay.

Translation efficiencies of mutant E7 cDNAs were quantified relative to expression from the WT template. Bars show mean values for seven viral proteins; error bars show standard errors of the mean.

https://doi.org/10.7554/eLife.04531.017
Comparison of codon pair scores generated by SSE using a dataset of 35,770 human mRNA sequences (y-axis) with those used in a previous analysis (Coleman et al., 2008).
https://doi.org/10.7554/eLife.04531.018

Tables

Table 1

Relationship between codon pair de-optimisation, CpG and UpA frequencies and virus fitness reduction

https://doi.org/10.7554/eLife.04531.003
WTCPD
VirusGeneProp'nCP biasCpGUpACP biasCpGUpAReplication ReductionRef
Poliovirus
PV-XCapsid14.8%−0.030.520.75−0.461.341.25×25Coleman et al., 2008
PV-XYCapsid25.9%−0.030.540.75−0.461.311.27×400
Influenza A virus*
HAMinSegs.411.4%0.020.430.64−0.421.651.11×3.5Mueller et al., 2010
HA/NPMinSegs.4,521.3%0.020.440.55−0.421.561.14×14
PR83FSegs.1,4,529.1%0.010.430.53−0.411.551.07×35
HIV-1
Agag4.6%0.030.471.04−0.431.431.25×7Martrus et al., 2013
Bgag4.7%0.0800.91−0.371.221.15×3
Cgag4.8%0.030.311.00−0.381.501.09× 8
Dgag2.1%−0.0200.49−0.421.470.99×1.5
PRRSV
SAVE5gp52.6%−0.060.630.73−0.381.371.14×4Ni et al., 20142
  1. *

    Codon pair minimised sequences of IAV were not provided in (Coleman et al., 2008) and for the purposes of comparison these have been reconstructed in SSE. Note that the CP scores described in Table 1 of that paper (−0.386, −0.420 and −0.421 for PB1, HA and NP respectively) are not minimum scores; these are in fact −0.533, −0.585 and −0.602. Therefore, for the purposes of comparison, CP score minimisation in the current study was targeted to the former values. Although the sequences generated by SSE were not identical to those obtained previously, they would demonstrate a similar distortion of dinucleotide frequencies to those used in the previous study (Coleman et al., 2008).

  2. Mutated region only (positions 147–542 in gp5).

  3. Data from replication assay in PAM cells.

Table 2

Composition and codon usage of E7 wt and mutant insert sequences

https://doi.org/10.7554/eLife.04531.010
RegionSequence (Symbol)G+C contentCpG Total*O/E ratio,UpA Total*O/E ratio,Codon Usage
CAIENcCP Bias
1Native (WT)47.6%51 (−)0.73062 (−)0.7420.68556.5−0.043
Permuted (P)47.6%51 (0)0.7302 (0)0.7420.69455.8−0.025
CpG/UpAL (cu)47.5%0 (−51)019 (−43)0.2270.68643.50.087
Max-U50.1%47 (−4)0.61043 (−19)0.5730.70849.60.328
Min_E47.5%51 (0)0.73662 (0)0.7350.74854.3−0.131
Min_U47.5%69 (+18)0.99276 (+14)0.9390.70958.3−0.134
Min_H49.8%106 (+55)1.40079 (+17)0.9810.69649.2−0.130
2Native (WT)47.1%18 (−)0.32048 (−)0.6950.74353.20.015
Permuted (P)47.6%18 (0)0.32048 (0)0.6950.73949.00.013
CpG/UpAL (cu)48.5%0 (−18)048 (0)0.2140.73947.20.118
Max-U46.3%24 (+6)0.44043 (−3)0.6010.75046.10.311
Min-E45.7%18 (0)0.34348 (0)0.6570.78553.3−0.091
Min-U47.4%37 (+19)0.64950 (+2)0.7380.76757.6−0.083
Min-H47.8%68 (+50)1.17265 (+15)0.9700.71549.7−0.085
  1. *

    Total number of CpG and UpA dinucleotides in sequence. Changes in numbers between mutated and original WT sequence are indicated in parentheses.

  2. Ratio of observed dinucleotide frequency (O) to that expected based on mononucleotide composition (E) that is, f(CpG)/f(C) × f(G).

  3. Values deliberately changed are shown in red (maximised) and blue (minimised).

Table 3

Enzymes used in selective digests for competition ASSAYS

https://doi.org/10.7554/eLife.04531.014
Virus 1Virus 2RegionEnzymeTarget
W|WP|P1SpeIPermuted
W|WMax-U1SacIMax
W|WMin-E1NcoIWT
W|WMin-U1NcoIWT
W|WMin-H1EcoRVWT
W|Wcu|cu1EcoRVWT
P|Pcu|cu1SpeIPermuted
Max-UP|P1SpeIPermuted
Max-Ucu|cu1SacIMax
Min-EMin-U1ClaIMin-U
Min-EMin-H1EcoRVMin-E
Min-UMin-H1ClaIMin-U
Table 4

Correlation between fitness ranking and sequence composition

https://doi.org/10.7554/eLife.04531.015
VariableSpearman Rp value
CpG/UpA*1.0<0.001
CP bias−0.700.1 (n.s.)
CAI−0.334>0.5 (n.s.)
ENc0.593>0.5 (n.s.)
G + C content0.075>0.5 (n.s.)
Translation efficiency−0.074>0.5 (n.s.)
  1. *

    Number of CpG and UpA dinucleotides in insert region.

  2. From values tabulated in (Ramsey, 1989).

  3. n.s. : not significant.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Fiona Tulloch
  2. Nicky J Atkinson
  3. David J Evans
  4. Martin D Ryan
  5. Peter Simmonds
(2014)
RNA virus attenuation by codon pair deoptimisation is an artefact of increases in CpG/UpA dinucleotide frequencies
eLife 3:e04531.
https://doi.org/10.7554/eLife.04531