1. Computational and Systems Biology
  2. Physics of Living Systems
Download icon

Epistasis and entrenchment of drug resistance in HIV-1 subtype B

  1. Avik Biswas
  2. Allan Haldane
  3. Eddy Arnold
  4. Ronald M Levy  Is a corresponding author
  1. Temple University, United States
  2. Rutgers University, United States
Research Article
Cite this article as: eLife 2019;8:e50524 doi: 10.7554/eLife.50524
9 figures, 6 tables, 2 data sets and 1 additional file

Figures

Figure 1 with 1 supplement
The Potts model predicts residue frequencies.

(A) Schematic showing that the Potts model can be used to classify sequences by how likely a residue α is to appear at a position i in a sequence S using the background-dependent probability, (S,i,α). (B) The observed frequency of the resistance mutation L90M in HIV-1 drug-experienced proteases matches the Potts model predicted frequencies in sequence clusters binned according to the Potts-predicted frequencies in steps of 0.1 (blue circles) with statistical error (green). Diameters of the circles represent the number of sequences. Inset shows the significant overlap in Hamming distance for sequences with low predicted mutant frequencies between 0.2 and 0.3 (blue) and high, 0.7 and 0.8 (pink) depicting the difficulty of such a classification based on Hamming distance. (C) The receiver operator characteristic (ROC) curve comparing the Potts model and Hamming distances as classifiers of mutational probabilities for L90M in HIV protease. (D) The average absolute error between the observed mutational frequencies and the Potts-predicted frequencies for the major drug-resistance mutations in HIV-1 in three drug targets. The average absolute error is calculated by binning the sequences in ascending order of their predicted frequencies such that there are roughly equal number of sequences in each bin as shown in Figure 1—figure supplement 1, and averaging over the absolute error in each bin.

https://doi.org/10.7554/eLife.50524.002
Figure 1—figure supplement 1
Observed vs predicted frequencies for calculating the average absolute error in Figure 1D.

The average absolute error in Figure 1D is calculated by binning the sequences in ascending order of their predicted frequencies such that there are roughly equal number of sequences in each bin, and averaging the absolute error of the bins. This binning procedure avoids finite sampling errors encountered when following a more restrictive method of binning by Potts frequencies from 0 to 1 in steps of 0.1 as shown in Figure 1B especially when the mutation frequency is small. The observed vs predicted frequencies are shown for primary drug-resistance mutations V82A occurring in response to PIs in HIV-1 protease (A); N155H occurring in response to INSTIs in HIV-1 IN (B); M184V occurring in response to NRTIs in HIV-1 reverse transcriptase (C); and Y181C/G occurring in response to NNRTIs in HIV-1 reverse transcriptase (D). Diameters of the blue circles represent data sizes in each bin, and error bars are shown as green cross-hairs.

https://doi.org/10.7554/eLife.50524.003
Entrenchment and the effect of epistasis on the favorability of a primary resistance mutation.

(A) ΔE90=Ewild-type90-Emutation90 change for sequences, conditional on the number of PI-associated mutations is shown as boxplots annotating the first, second and third interquartile range. Whiskers extend to 1.5 times the interquartile range with outliers marked as ’x’s and the mean values are marked as squares. The left ordinate scale shows the relative probability of reversion (e-ΔE), and the right shows ΔE. Sequences whose energy difference fall above ΔE=0 (dashed line) are entrenching backgrounds favoring the mutation. Sequence backgrounds where the mutation is favored on average are shown in red, the others in blue. The mutation L90M becomes favorable on average when there are about 9 PI-associated mutations, but there is a wide range of favorability and ‘which’ PI-associated mutations are present play an important role in determining if the primary mutation is favored/disfavored. The highlighted regions (white with dark border) show there are many sequence backgrounds with between 7 and 14 mutations in which L90M is either ‘highly entrenched or favored’ (top) or ‘highly disfavored’ (bottom). (B) Distributions of number of PI-associated mutations in sequences in the ‘highly entrenching’ and ‘highly disfavoring’ regions from panel A have a large overlap, again showing that entrenchment is not primarily determined by number of associated mutations. Prediction of the likelihood of M based on the number of PI-associated mutations alone for these sequence backgrounds where M is highly entrenched (red) or highly disfavored (blue) would be especially difficult due to the large overlap between them.

https://doi.org/10.7554/eLife.50524.004
Degree of entrenchment of key resistance mutations occurring in the catalytic core domain (CCD) of HIV-1 IN.

The change in Potts statistical energy ΔE for some of the key resistance mutations occurring in the catalytic core domain (CCD) of IN, is plotted as a function of the rank of mutation-carrying sequences, ranked in descending order of their favorability towards the mutant. Plot shows the degree of entrenchment for these mutations. For example, Q148R is highly entrenched in almost all sequences carrying it, whereas, G163K is entrenched (ΔE>0) in only about half of the sequences carrying G163K.

https://doi.org/10.7554/eLife.50524.011
Distribution of Potts ΔE scores for key residues associated with drug resistance in HIV-1 IN.

The distribution of the Potts ΔE(Ewild-Emut) scores for sequences carrying the particular resistance mutation are shown in ’green’ for the most frequently observed INSTI selected resistance mutations in HIV IN, and in ’blue’ for all other possible mutations at the same sites. Other possible mutations include rarely observed or unobserved mutations. The histograms show the differential distribution of ΔE scores for observed vs. unobserved/rare mutations at 15 primary mutation sites associated with evolving drug resistance in HIV-1 IN. The green (observed) and blue (unobserved/rare) distributions are normalized to the total number of primary DRMs in IN in the Stanford HIVDB and the total number of other possible mutations at the same sites, respectively. The mean ΔE scores for observed vs. unobserved mutations are +2.11 and −5.58, respectively (p<0.001). The wide distribution of ΔE scores also illustrates the role of the background in which the resistance mutation occurs.

https://doi.org/10.7554/eLife.50524.013
Entrenchment and favorability of key resistance mutations in specific backgrounds.

The ΔE change in Potts energy of a sequence is used as the measure of ‘entrenchment’ and favorability of key resistance mutations in HIV-1 NL4-3, HXB2, and the subtype-B consensus sequences, respectively, shown as a function of the average ΔE in drug-experienced HIV-1 subtype B patient populations (<ΔEpatient>) in the Stanford HIVDB for viral integrase (A,B,C) and reverse transcriptase (D,E,F). In each case, the Pearson correlation coefficients are indicated. The protein sequences for the molecular clones NL4-3 and HXB2 are obtained from GenBank with accession number AF324493.2 and K03455.1, respectively with protein ids for the pol polyprotein as AAK08484.2 and AAB50259.1, respectively. The subtype B consensus sequence is obtained from the Stanford HIVDB. The degree of ‘entrenchment’ in these subtype B strains is often not representative of the average entrenchment effects in a patient population or even the most representative background from a patient population.

https://doi.org/10.7554/eLife.50524.014
The particular sequence background in which a resistance mutation occurs affects the degree of entrenchment, with often a clear distinction between sequences where the mutation is present (green) versus absent (blue) shown here for the mutations N155H (A) and G140S (B) in integrase.

The degree of entrenchment for subtype B consensus, NL4-3 and HXB2 are shown as ’black dashed’, ’red’ and ’magenta’ lines, respectively. The Potts entrenchment score manifests as a clear distinction between backgrounds where a particular mutation is observed from ones where the mutation is absent, with the former more likely to present a distinct fitness advantage towards the mutation.

https://doi.org/10.7554/eLife.50524.015
Comparison of Potts models parameterized on drug-naive and drug-experienced HIV protein sequences.

The comparison of the effects of point mutations is shown in terms of Potts ΔE scores (which forms the basis of our study) using two different Potts models, parameterized on drug-naive vs drug-experienced sequences for PR drug-resistance mutations M46I (A), I54V (B), V82A (C), and L90M (D), all of which appear with at least a frequency of 0.25% in both datasets. Bin shading for the 2D histogram scatter plots shown here scales logarithmically with the number of sequences whose scores fall into each bin. To obtain ΔE scores, the sequences are scored using a drug-naive model vs. a drug-experienced model. The ΔE scores are highly correlated with a Pearson correlation coefficient of 0.82 (p<0.001), 0.93 (p<0.001), 0.85 (p<0.001), and 0.82 (p<0.001), respectively.

https://doi.org/10.7554/eLife.50524.016
Drug-pressure associated mutations are largely common between drugs of the same class.

The mutations (both primary and associated) arising in drug-treated HIV proteases in response to inhibitor treatment are shown corresponding to each protease drug. The diameters of the circles represent the number of mutations at that site that occurred in sequences treated with the particular drug. Most mutations that occur in response to one drug are seen to have occurred when treated with another drug of the same class, showing that the ‘spurious correlations’ (that could be picked up by a Potts model built on a mixture of patient sequences treated with different drugs of the same class, if the mutations occurring in response to one drug are not at all observed in response to another), are minimal.

https://doi.org/10.7554/eLife.50524.017
Appendix 1—figure 1
Sequence coverage for RT.

Figure shows the sequence coverage (# of sequences vs the # of residues) for RT drug-experienced (both NRTI and NNRTI) sequences derived from the Stanford HIVDB (22,444 isolates from 20422 patients). For RT, sequences with exposure to both NRTIs and NNRTIs were selected as an alternate search for RT sequences exposed to only NRTIs or NNRTIs would return a vastly smaller number of isolates (5398 and 80, respectively). Sequences with insertions (‘#‘) and deletions (‘ ~‘) are removed. MSA columns and rows with more than 1% gaps (‘.’) are removed. This resulted in a final MSA size of N = 19194 sequences from 17130 persons each with length L = 188 for RT. To retain enough sequence coverage in the MSA, we removed residues: residues 1–38, and residue 227 onwards for RT. For this reason, some interesting DRMs like F227I/L/V/C, L234I, P236L or N348I (NNRTI affected) for RT are not amenable for our analysis.

https://doi.org/10.7554/eLife.50524.020

Tables

Table 1
Confirmation of the Potts model entrenchment predictions for the mutation L90M in HIV-1 protease.

Analytical predictions of likelihoods of the mutation (using the Potts model) are shown along with the corresponding observed frequencies in different subsets of our dataset classified as entrenching (ΔE>0) or disfavoring (ΔE<0) for the mutation. The agreement between the predicted and observed frequencies is remarkable and serves as a confirmation for the Potts model entrenchment predictions.

https://doi.org/10.7554/eLife.50524.005
SequencesClassificationNumber of sequencesPredicted frequencyObserved frequency
AllEntrenched147582.7%83.9%
Disfavored328312.7%11.9%
All with |ΔE|1σ of ΔE=0Entrenched56097.6%97.3%
Disfavored14443.5%1.9%
With between 7 and 14 mutations and |ΔE|1σ of ΔE=0Entrenched47097.5%97.0%
Disfavored2392.8%2.1%
Table 2
Entrenchment in the population (of sequences carrying the mutation) for four major classes of resistance mutations in HIV-1 subtype B: A DRM is entrenched in the population if at least ~50% or more of the sequences containing the mutation entrench it (i.e ΔE=Ewild-Emut>0).

We catalog entrenchment in the population (of sequences carrying the mutation) for all primary DRMs appearing at ~1% frequency or more, and our study reveals mutations in response to NNRTIs are much less entrenched in the population (of sequences carrying the mutation) than others.

https://doi.org/10.7554/eLife.50524.006
NRTIsNNRTIsPIsINSTIs
Years in therapy32232412
Number of primary DRMs18151315
DRMs entrenched in the population (of sequences carrying the mutation)11 (61.1%)1 ( 6.7%)10 (76.9%)13 (86.7 %)
Number of DRMs conferring high-level resistance71176
High-level resistance DRMs entrenched in the population (of sequences carrying the mutation)3 (42.8%)1 (9.1 %)5 (71.4%)5 (83.3 %)
  1. Source: Results shown in this table are based on calculations for ‘entrenchment in the population’ (of sequences carrying the mutation) for each primary DRM occurring at ~1% frequency or more in HIV-1 RT, PR, and IN, respectively as shown in Table 2—source data 1, Table 2—source data 2, Table 2—source data 3 and Table 2—source data 4. Resistance levels are determined according to Stanford HIVDB (Rhee et al., 2003; Shafer, 2006) mutation scores for PIs, NRTIs, NNRTIs, and INSTIs. 

Table 2—source data 1

Table showing entrenchment in the population (of sequences carrying the mutation) for primary resistance mutations against NRTIs.

https://doi.org/10.7554/eLife.50524.007
Table 2—source data 2

Table showing entrenchment in the population (of sequences carrying the mutation) for primary resistance mutations against NNRTIs.

https://doi.org/10.7554/eLife.50524.008
Table 2—source data 3

Table showing entrenchment in the population (of sequences carrying the mutation) for primary resistance mutations against PIs.

https://doi.org/10.7554/eLife.50524.009
Table 2—source data 4

Table showing entrenchment in the population (of sequences carrying the mutation) for primary resistance mutations against INSTIs.

https://doi.org/10.7554/eLife.50524.010
Table 3
Entrenchment of at least one primary DRM for each of the four drug classes in HIV-1: Table shows the percentage of drug-experienced sequences which contain at least one primary DRM, and the percentage of drug-experienced sequences which contain at least one primary DRM such that the DRM is entrenched by its respective background.

This is shown separately for resistance mutations occurring in response to each of the four HIV drug classes. A significantly lower percentage (~28%) of the patient sequences carry at least one entrenched resistance mutation conferring resistance to the NNRTIs when compared to other drugs (~50%–65%).

https://doi.org/10.7554/eLife.50524.012
NRTIsNNRTIsPIInsti
Number of sequences in MSA191941919447581220
% of total sequences containing at least one primary DRM78.5%62.4%64.8%61.7%
% of total sequences with at least one entrenched primary DRM64.7%28.1%50.8%47.2 %
Appendix 1—table 1
Most entrenched (ME) and least entrenched (LE) or most disfavoring sequence pairs with the same Hamming distances (HD) for primary resistance mutations against the INSTIs
https://doi.org/10.7554/eLife.50524.021
E92QL74MN155HQ148HG140S
PositionConsensusMELEMELEMELEMELEMELE
6D------E------------
11E--------------D--D
17S----NNNN--T----
20R----------------K--
22M----I--------------
23A----V--------------
28L----I--------------
31V--------I--I--I--
32V------------------I
37V------------------I
39SCC----------------
45LQ--V--------------
50M----MI----I--L--
63LI------------------
72IL----V--V--------
74LM--MM------------
92EQQ----------------
97T----AA------------
101L--II--I----III
119S----TR------P----
124T--N------A--N----
135I--V----------V----
136K----------------Q--
138E----D------K------
140G----------SS--SS
143Y----R--------------
148Q----------HHHHR
151V------II----------
154M----------I--------
155N--H--HHH--------
156K------N------------
170E----------A--------
181F--L----------L----
188K------------R------
196A--------P----------
201V--I----IIII--I
206T------------S------
208I------M--------L--
212E----A------------
215K------------S------
216QN------------------
220I--------------L----
230S------------G------
232D--------N--------E
234LI------------------
256D----E--E------EE
HD88121299101099
  1. Residues same as consensus are shown as ‘--', mutations are shown as the one letter abbreviated alphabet encoding the mutant residue (in bold are primary drug-resistance mutations appearing at more than 1% frequency).

Appendix 1—table 2
Most entrenched (ME) and least entrenched (LE) or most disfavoring sequence pairs with the same Hamming distances (HD) for primary resistance mutations against NNRTIs
https://doi.org/10.7554/eLife.50524.022
P225HY188LL100IK103N/EY181C/GE138K/R
ConsensusMELEMELEMELEMELEMELEMELE
39T-----------S------------
41M----LLLL----------L
43K----------R------------
44E------D----------------
48S------------T----------
49K--------------R--------
50I------------------------
58T----N------------------
60V------------------I----
62A----V----------V------
65K----------------R------
67D------GN--------------
68S------G--------G------
69T----N--N------IN----
70K------T----------R----
74L--V--VV--V--------V
75V----I--T------A------
77F----L------------------
82K--------------------R--
90V------------I--I------
98ASG------S--S--------
100L--------III----------
101KQ------------EE--E--
103KN----NN--NN--NR--
104K----------------N------
106V----I--------A--------
108V----------------I------
116F----W------------------
118V------I----------------
122K------EE--E------EE
123D----------E------------
126K--------------------R--
135IL----KMV------T----
138E--------------K----KK
139T--------------------E--
142I----------Q------VT--
151Q----M------------------
158A--S------------------S
162S----------C----------A
165T--I------K------------
166K--------------Q--------
169E------A----------------
172R----K------------------
173K----N------------------
176PQ----------------------
177D------------N----------
178I------------M------M--
179V--D--I--D------------
180I------------------M----
181Y--C--C--------CC----
184MV------VII----V----
188Y----LL----------L----
189V----I------------------
190G--A--S------AA------
196GE----------------------
197Q--------------K--------
200TAVA--AAAA--A--A
202I----V--V--------------
203E---------D--------------
207Q----------A------------
210L------WW--------------
211R------K--K----KK--K
214F----L------------L----
215T----YYYD----------Y
219K--------E------QQ----
221H----------------Y------
223K----T------------------
225PHH--------H----------
HD99181816161111141499
  1. Residues same as consensus are shown as ‘--', mutations are shown as the one letter abbreviated alphabet encoding the mutant residue (in bold are primary drug-resistance mutations appearing at more than 1% frequency).

Appendix 1—table 3
Most entrenched (ME) and least entrenched (LE) or most disfavoring sequence pairs with the same Hamming distances (HD) for primary resistance mutations against PIs
https://doi.org/10.7554/eLife.50524.023
I50VI84VL90MV82A
PositionConsensusMELEMELEMELEMELE
10LI--F--I--II
13I--V----V----V
15I--V--V--V----
18Q--------H----H
19L--I------------
20K--------IR----
24L------------I--
30D------N--------
32V----------I----
33L--F--F----F--
34EQ--------------
35E--D--D--D----
36M------L--I----
37N--S--E--D----
41R--K------K----
43K------T--------
46MI--I----LLI
47I----------A----
48GV--------------
50IVV------------
53FY----------L--
54IS--V--V--V--
55K----R----------
57R----------K----
58Q--E------------
60D------E--------
61Q----------Y----
62IV----VV------
63LQPPPP--PP
64I----V----------
66I----V----------
67C--------F------
71AVT----V--V--
72IVV----K------
73G----T--S------
74TS--------------
76L--------------V
77VI--------I----
79P----A--A------
82VAI------IAA
84I----VVV----V
88N------D--------
89L--V----------I
90L----MMMM----
93ILLL----------
95C----F----------
HD15151313141499
  1. Residues same as consensus are shown as ‘--', mutations are shown as the one letter abbreviated alphabet encoding the mutant residue (in bold are primary drug-resistance mutations appearing at more than 1% frequency).

Data availability

Sequence data analyzed in this study is obtained from the Stanford University HIV drug resistance database (https://hivdb.stanford.edu/), Los Alamos HIV sequence database (https://www.hiv.lanl.gov/content/sequence/HIV/mainpage.html). Source data tables are provided for Table 2.

The following previously published data sets were used
  1. 1
    Stanford HIV drug resistance database
    1. S-Y Rhee
    2. MJ Gonzales
    3. R Kantor
    4. BJ Betts
    5. J Ravela
    6. RW Shafer
    (2003)
    Stanford University HIV drug resistance database: Genotype-Treatment Correlations.
  2. 2
    Los Alamos HIV sequence database
    1. B Foley
    2. T Leitner
    3. C Apetrei
    4. B Hahn
    5. I Mizrachi
    6. J Mullins
    7. A Rambaut
    8. S Wolinsky
    9. B Korber
    (2004)
    Consensus and Ancestral Sequence Alignments, Select 'Alignment type:Consensus/Ancestral', 'organism: HIV-1/SIVcpz', 'Pre-defined region of the genome: POL', Subtype:All', 'DNA/PRotein: Protein'.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)