Illustrative example of forward in time simulation of protein evolution integrating a birth-death population evolutionary process with fitness from the protein folding stability and the modeling of protein evolution with a structurally constrained substitution model.

Given a protein variant assigned to a node at time t (blue node), its fitness is calculated considering its protein folding stability. Then, the fitness is used to determine the birth and death rates for that variant, which provide the time to the next birth or death event (horizontal dashed line) that corresponds to the forward-in-time branch length. Next, the variant is evolved forward in time toward each descendant, upon the previously determined branch length, under a SCS model of protein evolution. The process is repeated, forward in time, starting at each new variant. If a death event occurs, the variant of the extinct node (pink node) is obtained but it does have descendants. The process finishes when a particular sample size or simulation time is reached (i.e., t+n).

Comparison of real and predicted sequences of the HIV-1 MA protein considering predictions based on the SCS and neutral models.

For the data simulated under the SCS and neutral models, the table shows the Grantham distance between the amino acids that changed during the real and predicted evolutionary trajectories and, the Kullback-Leibler (KL) divergence between the real and predicted multiple sequence alignments. Next, it shows the folding stability (ΔG) of the real protein variants at times T1 and T31, and the folding stability of the predicted protein variants at time T31. The error corresponds to the 95% confidence interval from the mean of predictions of folding stability.

Prediction error of SARS-CoV-2 Mpro and PLpro evolution regarding physicochemical properties of the amino acid changes accumulated during the evolutionary trajectories and protein folding stability.

(A) Grantham distance calculated from the amino acid changes that occurred during the real and predicted evolutionary trajectories based on SCS and neutral models of protein evolution. (B) Variation of protein folding stability (ΔΔG) between real and predicted protein variants based on SCS and neutral models of protein evolution. Notice that positive ΔΔG indicates that the real protein variants are more stable than the predicted protein variants and vice versa. Error bars correspond to the 95% confidence interval of the mean of prediction error from 100 multiple sequence alignments simulated for the corresponding population and time.

Prediction error of HIV-1 PR evolution at diverse populations regarding physicochemical properties of the amino acid changes accumulated during the evolutionary trajectories and protein folding stability.

(A) For each viral population (patient, represented with a particular color) and time, Grantham distance calculated from the amino acid changes that occurred during the real and predicted evolutionary trajectories. For each population, the mean of distances obtained over time is shown on the right. (B) Relationship between Grantham distances and accumulated number of substitution events (R2 = 0.0001, which indicates a lack of correlation between these parameters). (C) Variation of protein folding stability (ΔΔG) between real and predicted protein variants at each viral population and time. For each population, the mean of distances obtained over time is shown on the right. Notice that positive ΔΔG indicates that the real protein variants are more stable than the predicted protein variants and vice versa. Error bars correspond to the 95% confidence interval of the mean of prediction error from 100 multiple sequence alignments simulated for the corresponding viral population and time.